Sprint 7 close, Start 8

Notes for the 2010-06-14 close of sprint 7 and startup of sprint 8.

Major goals for Sprint 8:

 * MN synchronization functional (CN pulls content from MNs, stores the sys + sci metadata, indexes the metadata, enables search and retrieval of the content)
 * Monitoring framework using Cacti operational for low level monitoring of hardware
 * Health APIs defined and implemented for CN and MN (to enable Cacti statistics collection)
 * Integration testing framework operational for exercising MN and CN APIs

Perhaps: get Metacat KNB node up and running
 * Setup VM
 * Install metacat
 * Harvest content (can be kind of slow) - use a sub-sample of about 1000 records or so
 * Generate system metadata (tool available for this)

CN replication


New Tasks

Story: (5, sprint 9, Dave) Need a nightly / continuous build system to ensure that all code (CN, MN, client) builds as expected. There are plenty of automated frameworks for this.  Select one and get it operational.

BUG: (Chad, sprint 8) Metacat Identifier issue. Does not want to create new Objects with .\d+ suffix.  This is an issue related to the internal versioning implemented by Metacat.  Need a unit test created (Robert)

BUG: (Chad, sprint 8) Some problems with Metacat - associated with authentication. D1 client is failing on create() because of some auth changes.

BUG: (Low Priority, sprint 10, Robert)X Some service types not fully serializable - need to have schemas associated with them. A couple classes need to have schemas associated with them - the log messages will be blown away.
- Need to check exactly which classes need to have schemas


Story(5): (Robert, sprint 8)X The CN needs to  maintain a  list of registered Member Nodes and their service endpoints and  additional metadata that can be used by the harvester to iterate over  Member Nodes to initiate MN synchronization
- create  schema for the registry
- enable serialization of the  registry document against metacat

Outline of registry document:
<registry>
   <node type="member">
    <name>Human readable name  of member node</name>
     <lastHarvested>2010-06-07T23:55:10Z</lastHarvested>
     <lastCompleteHarvest>2010-06-07T23:55:10Z</lastCompleteHarvest>
     <baseURL>http://some.mn.com/mn/</baseURL>  
     <services>
      <service name="mn_crud.get" rest="object/${GUID}" available="true"    datechecked="2010-06-07T23:55:10Z" />
       <service name="mn_replicate.listObjects" rest="object" available="true"    datechecked="2010-06-07T23:55:10Z" />
       <service name="mn_crud.getSystemMetadata" rest="meta/${GUID}" available="true"     datechecked="2010-06-07T23:55:10Z" />
      ...
     </services>
  </node>
  ...
</registry>


Story(3): (Robert) Modify the harvester to read and update the registry information
  - harvester  needs to read content from the registry document
  -  harvester needs to update registry doc when harvest has completed.


Story: (13) (Sprint 9) (Robert)Utilize a simple objectFormat registry to determine which  configuration to feed the Mercury indexer 
  - Task: Define how to handle science metadata formats that are not handled by Mercury.
    - Keep in metacat, but it won't be indexed
    - Need to define log messages to report on what's not being processed by the indexer
  - Task: create an enumeration for mercury index formats that are supported by Mercury.  This will be part of the packager functionality.
  
  - how portable are the mercury indexer config files?
  - Task: List the types of metadata that are being used in the prototype
  - Task: List the metadata types that are supported by the indexer
  - Task: Create configurations for indexer to parse all metadata types in the prototype
    - Jim: Working on the KNB parsing
      - About 3 versions of EML that are being used in KNB
      - Version in doctype of EML - not directly queryable but can specify the version to retrieve through metacat api
    - Robert: working with Dryad sci meta


#262 - Tools for exercising APIS (Roger and Dave)
 - New task: Dave: review the integration testing implemented by Roger so far.
 - New task: Implement tests that adhere specifically to the MN APIs only without regard for the type of node
 - Blocking 269 - need to add mechanism for searching the CN


New story: (3) (Robert, Sprint 9) Implement CN_crud.resolve (cn_api_crud.html#CN_crud.resolve) 
Query metacat for objects of type systemmetadata where identifier = ID, then parse system metadata to extract object location (which MNs), construct response object and return to client


#351 - (Chad) Implement Metacat MN
  #426 - (sprint 8) move forward to next sprint (likely to be closed early)


New Story: (Rob, Roger, Dave, lower priority that the log output for packager notification) Create a test harness for exercising the MN + CN APIs using the Java client

New story: #667 (Dave, sprint 8) Implement a VM and install the test harnesses to perform automated testing of the CN and MN services.


#633 Monitoring framework
#635- Move to next sprint
#636- Move to next sprint. Setup accounts and test, config remaining hosts etc
#637- Need to ensure that appropriate commands are part of the base install

#638 - Dupe of tasks under #640 

#640 - Need to add same functionality to Metacat MN health API

 * Implement MN and CN health
   * Define MN and CN health API
   * Implement CN health API
   * #641 Implement MN health API
 * Create an account for monitor ssh  access to machines being monitored
   * setup LDAP user for monitoring  service: "dataone_monitor"
   * restrict the scripts that can be  executed by the account
   * setup passwordless keys for access to  the account from the monitor machine
 * Fix authentication bugs in D1Client
 * Remove old service type classes and  switch metacat to use the new ones

#648 - static test VM, low priority (move to an enhancement / separate task).  This could be part of hte VM setup for #667.

#650, #651 - harvesting process.  Need to complete testing


Story: (5, Sprint 8, Chad) Metacat notification of downstream services that content has changed.
- metacat will generate log messages.  (task-chad)Create a separate log file using Log4J ('RollingFileAppender' for ResourceHandler...add log statements with localIds of SM and metadata doc.)  Create separate logs for replication and create().
- task: (Robert) implement a watcher for the log that will trigger the packager to generate packages for new content and store the state of operations against the log file (time, offset)