Notes for 2013.18-Block.3.1
===========================

Skye
----
 * 20130510
   * Releasing ONEMercury
     * Will delete 1.1.2_RC2 tag with CSS change
     * Had to deal with a Solr bug - that's fixed now
     * ESRI support is in now
     * Mercury supports UTF-8 now
   * Auditing reporting meeting
     * Would like to expand Event enumeration, typing as string would break backwards compatibility
     * May use a separate index if the change breaks backwards compatilbility
     * Model it with the agg log schema
   * Production rollout
 * 20130508
   * Working on auditing
   * Working with Solr/Jetty for upgrading to Solr 4.x - possibly use Jetty for indexing?
 * 20130506
   * Lucene conference was good
   * learned a good deal about Solr, etc.
   * Solr 4.x has memory improvements, processing improvements: would like to upgrade
   * Back to auditing

Roger
-----
 * 20130510
   * Working on ONEDrive to describe hierarchical files
   * strarted d1_workspace_client, may become part of libclient
     * simple, recursive structure (models folders)
 * 20130508
   * design specs for ONEDrive and ONEMercury/other web UIs
     * security considerations are interesting
     * looking at OAuth
 * 20130506
   * U/A meeting: came up with ONEDrive solutions
     * ONEDrive will work in coordination with a web search UI like ONEMercury
     * storage of queries or objects in ONEMercury, assigned to a folder
     * design work needs to be done on designing a REST API for storing collections (workspace concept)
   * Will add to architecture documentation

David Doyle
-----------
 * Reading, getting a handle on tools and software
 * Finals for the semester
 * Will probably work on documentation

Chris B.
--------
 * 20130510
   * David Doyle is going to be replacing Chris
   * Will work on debconf stuff over the weekend
     * Robert: are deb packages ready? Some errors in the buildout
 * 20130506
   * Becoming familiar with foresite
   * Working on debconf flags

 * 20130503
   * Familiarizing himself with ORE for indexing work
 * 20130501
   * Ansible templating issue fixed - using newest 1.2 version now
     * site fact module can be used now
     * waiting for packages from Robert
     * Robert wants flags set (not just in debconf), so those will need to be integrated into the Ansible module
   * bioportal VM has been compromised
     * rooted due to unchanged base root password (brute force ssh attack)
     * installed bind, httpd
     * isolated issue, no ldap. no d1 passwords.
     * VM was basically unmaintained, which caused the issue really
     * Will save VM disk
   * Will work on debconf modules for flags above now
   * Will work on Ansible documentation, system architecture
   * New student coming on in June, will train this person
 * 20130429
   * Ansible work - found a bug that is a blocker
     * setting a map of maps for debconf entries
       * templating system isn't expanding it
     * Trying Ansible 1.2 - should be fixed
     * Not working with DataONE buildout packages yet

Rob
---
 * 20130510
   * Working on very large resource maps, will write up a summary in the architecture docs
   * Will be recommending a nesting solution first
     * shouldn't affect indexing
   * When serializing resource maps, Jibx does a lot of validations per iteration, which is a performance hit (6 minutes)
   * Deserializing is quick (6 seconds)
 * 20130508
   * Working on aggregations and data packaging requirements
     * Updated packaging documentation, Dave will review
       * changed MUST to SHOULD wrt aggregation URIs following hash URI form, due to existing resource maps that do not follow hash URI form
       * added requirement that references (URIs) to other data packages must resolve to a resourse map (since we can't guarantee that agg. URIs resolve.
       * added section on resource map validation
       * added section on how to reference other data packages
       * added section on very large packages with performance statistics
   * Continued work on ORE parsing
     * large maps ( > 10K triples) take 5 minutes to serialize, 5 sec to deserialize
     * dealing with NPEs during deserialization (triple with null subject?)
     * Nesting relationships: can possibly only use unidirectional predicates for 'aggregates'. Can infer the inverse.
     * Question: do we need the identifier statement?
       * the impact: if we don't use it and also remove requirement for isDocumentedBy and isAggregatedBy, can make the resource maps 1/3 the size, and take 50% of the time to serialize.
 * 20130506
   * Working on resource maps
 * 20130503
   * Working on nesting resource maps
     * ORE spec: can aggregate aggregations, but the resource maps are distinct
     * {resourceMapURI}#aggregation - having links to the aggregation needs to be consistent
     * also, we may be able to infer transitivity across aggregations
     * If aggregation ids need to be unique, we'd need to check this in DataONE (d1_sync?)
     * TODO: decode if we'll aggregate resource maps or aggregate aggregations
   * worked on various sized resource maps
     * up to 10K triples, 30K gave OOM exceptions using the foresite library in libclient_java
 * 20130501
   * figured out the reasoning setup - needed to feed in the ORE schema.
     * building it into ResourceMapFactory in libclient_java
     * will cache the ORE model in libclient_java
     * need to set up tests that make sure that reusing the cached model doesn't cause problems.
     * TODO: will need to support nested packaging in libclients' DataPackage classes
 * 20130429
   * Working with the OWL Reasoner in foresite
   * Understanding the different rasoners for RDFS, OWL
   * Question: mutability and namespaced identifiers
     * What are the requirements for citation wrt DataONE?


Robert
------
 * 20130510
   * Was able to get a buildout fully run
   * Worked with Skye on log event management
     * If Event types become strings, MN event types
     * Changing the Event enumm to string would break libclient v1 compatibility
     * Talked about an index for exception reporting
 * 20130508
   * Debugging the buildout packages - making sure it builds and installs correctly
   * Discussion with Skye re: log aggregation Solr index to store auditing events
 * 20130506
   * Working on debconf packaging
     * Finished config script work on Friday
     * Now finishing up postinst (template names have changed, merging differences and removing redundancies/obsolete code)
   * Spent last week in the U/A meeting
 * 20130503
   * Survived the week
   * Good meeting with the U/A group
     * MN issues - good feedback
   * Will continue on packaging code - should be able to finish up today
 * 20130501
   * U/A meeting
   * Need a few more hours on packaging work
     * doing work in github until it can build properly
 * 20130429
   * CN Packaging
     * debconf reading, debian documentation is conflicted with the use of config
     * Will reduce the amount of prompting
   * Prepping for U/A meeting


Chris J.
--------
 * 20130510
   * Discussion with Rob on ORE maps
   * Metacat work and meeting with Ben, Matt, Jing
   * Review of ONEDrive summary that Roger is working on
     * We need to discuss the APIs needed for UIs, along with the Log Stats API
   * Review of buildout and release with Skye for ONEMercury changes
     * Tagging: immutable tags can be 'difficult' when a minor change is discovered
       * But, tags should be immutable, so:
       * Proposal: Let's tag as: COMPONENT_NAME_vX.X.X_RCX, and once we are released, tag as COMPONENT_NAME_vX.X.X
   * Further editing work on the sensor best practices document
   * Will be out Tue, Wed, Thu of next week
 * 20130508
   * Working on System Metadata Management architecture docs
   * Worked with Skye on auditing
   * Upgraded CNs with Ben
   * Priorities organization with Dave
   * 
 * 20130501
   * Metacat-specific work
     * troubleshooting the CN-CN replication issues #3740
     * working on system metadata managment propposal
     * Will work with Ben on CN upgrade of Metacat installations
     * Replica target nodes: need to deprecate nodes in the same
   * 
 * 20130429
   * bash scripts for updating and registering nodes
   * documentation for the above
   * http://mule1.dataone.org/OperationDocs/member_node_deployment/node-registration-update-script.html
   * Requirements and documentation for system metadata management proposal


Matt
----
 * 20130503
   * security team CC
   * Will be at the UNM DataONE planning meeting for the next round
     * what do we need to do in the next 5 years from a technological perspective
     * send comments to Dave and Matt
     * May end up writing leveraged grants to continue the D1 CI
 * 20130501
   * Spent some time on the CN-CN replication issues
   * Patched Metacat with LEFT JOIN SQL query
     * will go into 2.0.7 on the CNs

Ben
---
 * 20130508
   * Potential PPBIO Member Node. Interest from "dvd" on IRC yesterday (new info manager there in Brazil). https://redmine.dataone.org/issues/3748
   * Where do we send people to contact D1? John Cobb
   * Involved in the security audit planning in June
   * Working with CILogon and portal code; delegation of certificates has changed
     * Need to target this for a CN release
 * 20130506
   * Upgraded stage CNs to Metacat 2.0.7
     * We had a few objects that didn't Metacat-replicate correctly, due to conflicting autogen ids
     * Ben had written a script to clean up id mappings on each CN, which we'll use
     * Will be upgrading the production CNs today
   * Will be working on CILogon work this week
 * 20130503
   * Will be upgrading the CNs in stage and production
     * We will only be turning off Metacat-tied services
     * Will go through the round robin procedures
     * Database dumps are the big time sync - a few hours of upgrade work
 * 20130501
   * Working on the Metacat 2.0.7 release branch, testing now (building via Hudson)
   * Unstable and stable channels are using 2.0.7 now
   * Need to update Authentication documentation

Dave V
------
 * 20130510
   * Working on tagging architecture docs
   * Developer resources doc
   * Product/Component dependencies
   * Working through scheduling for products, especially CN stack version plans
   * Pondering high level design for ONEDrive and other ITK product interactions with portal
 * 20130508
   * At PPSR meeting (lots of interesting high volume citizen observation projects)
   * Working through dependencies, updating component docs 
   * Goal is to have:
     * list of all products
     * dependency between products 
     * activities for each product
     * scheduling / ordering of activties for each product
     * overall scheduling
 * 20130506
   * Reviewed resource map docs
     * usefulness of aggregations
       * points to a concept of a collect
       * resource map is the instantiation of the concept
       * content negotiation determines resource map serialization
       * D1 is most interested in the conceptual collection - serialization aside
     * Data collections should really be referenced by the aggregation id
       * For now: reference the resource map, but recommend future use of referencing the aggregation
       * Behaviour will need to be backwards compatible so that existing RMs that do not follow the recommendation are still properly interpreted by DataONE service.
 * 20130503
   * Good workplan for ONEDrive from the U/A meeting
   * Will be cleaning up ask.dataone.org
   * Initial call with Jim Basney, CILogon security audit
     * Ben will be POC for technical issues
 * Formalizing ONEDrive workplan
   * Tweaking ask.dataone.org a bit
   * Initial call with CTSC for security audit
     * Focus on high level architecture and implementation
     * Unlikely to reach code review level of detail
 * 20130501
   * At Usability group meeting
   * Evaluating ONEDrive UI design and next steps
     * may present ONEDrive at DUG meeting in July?
   * Bob Sandusky will be at a DSpace meeting next week


Discussion on ORE parsing in libclient
--------------------------------------
 * ResourceMapFactory now exposes new methods:
   * deserializeResourceMap(InputStream is, boolean useReasoners)
   * returns an inferred (or not) resource map

Nesting DataPackages to keep 'em short.
see  http://www.openarchives.org/ore/1.0/datamodel#ore:isDescribedBy
RemP describes AggregationP
AggregationP aggregates MD-All
Aggregation aggregates ChP1, ChP2, ChP3, ...

RemChP  describes AggregationChP1
D1 isDocumentedBy MD-All
D2 isBlahBlahBlah 

  <rdf:Property rdf:about="http://www.openarchives.org/ore/terms/aggregates">
    <rdfs:label>Aggregates</rdfs:label>
    <rdfs:comment>Aggregations, by definition, aggregate resources. The ore:aggregates relationship expresses that the object resource is a member of the set of Aggregated Resources of the subject (the Aggregation). This relationship between the Aggregation and its Aggregated Resources is thus more specific than a simple part/whole relationship, as expressed by dcterms:hasPart for example.</rdfs:comment>
    <rdfs:subPropertyOf rdf:resource="http://purl.org/dc/terms/hasPart" />
    <rdfs:domain rdf:resource="http://www.openarchives.org/ore/terms/Aggregation" />
    <rdfs:range rdf:resource="http://www.openarchives.org/ore/terms/AggregatedResource" />
    <owl:inverseOf rdf:resource="http://www.openarchives.org/ore/terms/isAggregatedBy" />
    <rdfs:isDefinedBy rdf:resource="http://www.openarchives.org/ore/terms/"/>
  </rdf:Property>

D1 Indexing:
https://repository.dataone.org/software/cicore/trunk/cn/d1_cn_common/
https://repository.dataone.org/software/cicore/trunk/cn/d1_cn_index_generator/ (Skye's work for doing the indexing, work queue indexer)
https://repository.dataone.org/software/cicore/trunk/cn/d1_cn_index_processor/ (Task executor)


Stage CN Upgrade Notes
----------------------
- Remove from RR: orc, unm (DONE, cj)
- Stop processing: orc, unm (DONE, brl)
- Stop Tomcat: orc, unm (DONE, brl)
- pg_dump: unm, ucsb (DONE, cj)
- /var/metacat backup: unm, ucsb, orc (DONE, cj)
- fetch latest debian packages unm, orc (ubuntu-stable)
- install dataone-cn-metacat: unm, orc (DONE, brl)
- Configure upgrade DB from 2.0.5 to 2.0.7 using Metacat admin screen: unm (DONE), orc (DONE)
- [NOTE: remember to restart Tomcat after configuration is complete.]
- Add RR: orc, unm (DONE, cj) 
- Remove RR: ucsb (DONE, cj)
- Perform Metacat upgrade steps as above: ucsb (DONE, brl)
- Add RR: ucsb (DONE, cj)

Discussion on Authentication for ONEDrive/Web
---------------------------------------------

Discussion on CI Prioritites
----------------------------

Dependency graphs: http://mule1.dataone.org/ArchitectureDocs-current/implementation/components.html
Components List: https://docs.google.com/spreadsheet/ccc?key=0Ai3ryhJR2IgZdEwwTDhnai01UXN1RlRoUWtkOFNyZVE#gid=0
Affected Components per Project: https://docs.google.com/spreadsheet/ccc?key=0AjusEcfJ75HCdGJKd05NemctcjVoU18yNFZyeHVJdWc#gid=0


 * TODO: Change hudson to build d1_portal before d1_mercury_common to ensure the dependency is met (Skye)
 * Some libraries don't have the DataONE maven repo listed as a location (Sonatype only)
 * TODO: Create dependency graphs for each of the cn-buildout components to simplify the graphs (Dave)
 * TODO: deprecate 2 repl member nodes (Chris)