/20100614-sprint

Sprint 7 close, Start 8

Notes for the 2010-06-14 close of sprint 7 and startup of sprint 8.

Major goals for Sprint 8:

MN synchronization functional (CN pulls content from MNs, stores the sys + sci metadata, indexes the metadata, enables search and retrieval of the content)
Monitoring framework using Cacti operational for low level monitoring of hardware
Health APIs defined and implemented for CN and MN (to enable Cacti statistics collection)
Integration testing framework operational for exercising MN and CN APIs

Perhaps: get Metacat KNB node up and running

Setup VM
Install metacat
Harvest content (can be kind of slow) - use a sub-sample of about 1000 records or so
Generate system metadata (tool available for this)

CN replication

New Tasks

Story: (5, sprint 9, Dave) Need a nightly / continuous build system to ensure that all code (CN, MN, client) builds as expected. There are plenty of automated frameworks for this. Select one and get it operational.

BUG: (Chad, sprint 8) Metacat Identifier issue. Does not want to create new Objects with .\d+ suffix. This is an issue related to the internal versioning implemented by Metacat. Need a unit test created (Robert)

BUG: (Chad, sprint 8) Some problems with Metacat - associated with authentication. D1 client is failing on create() because of some auth changes.

BUG: (Low Priority, sprint 10, Robert)X Some service types not fully serializable - need to have schemas associated with them. A couple classes need to have schemas associated with them - the log messages will be blown away.
- Need to check exactly which classes need to have schemas

Story(5): (Robert, sprint 8)X The CN needs to maintain a list of registered Member Nodes and their service endpoints and additional metadata that can be used by the harvester to iterate over Member Nodes to initiate MN synchronization
- create schema for the registry
- enable serialization of the registry document against metacat

Outline of registry document:
<registry>
   <node type="member">
    <name>Human readable name of member node</name>
     <lastHarvested>2010-06-07T23:55:10Z</lastHarvested>
     <lastCompleteHarvest>2010-06-07T23:55:10Z</lastCompleteHarvest>
     <baseURL>http://some.mn.com/mn/</baseURL>
     <services>
      <service name="mn_crud.get" rest="object/${GUID}" available="true"    datechecked="2010-06-07T23:55:10Z" />
       <service name="mn_replicate.listObjects" rest="object" available="true"    datechecked="2010-06-07T23:55:10Z" />
       <service name="mn_crud.getSystemMetadata" rest="meta/${GUID}" available="true"     datechecked="2010-06-07T23:55:10Z" />
      ...
     </services>
</node>
...
</registry>

Story(3): (Robert) Modify the harvester to read and update the registry information
- harvester needs to read content from the registry document
- harvester needs to update registry doc when harvest has completed.

Story: (13) (Sprint 9) (Robert)Utilize a simple objectFormat registry to determine which configuration to feed the Mercury indexer
- Task: Define how to handle science metadata formats that are not handled by Mercury.
    - Keep in metacat, but it won't be indexed
    - Need to define log messages to report on what's not being processed by the indexer
- Task: create an enumeration for mercury index formats that are supported by Mercury. This will be part of the packager functionality.

- how portable are the mercury indexer config files?
- Task: List the types of metadata that are being used in the prototype
- Task: List the metadata types that are supported by the indexer
- Task: Create configurations for indexer to parse all metadata types in the prototype
    - Jim: Working on the KNB parsing
      - About 3 versions of EML that are being used in KNB
      - Version in doctype of EML - not directly queryable but can specify the version to retrieve through metacat api
    - Robert: working with Dryad sci meta

#262 - Tools for exercising APIS (Roger and Dave)
- New task: Dave: review the integration testing implemented by Roger so far.
- New task: Implement tests that adhere specifically to the MN APIs only without regard for the type of node
- Blocking 269 - need to add mechanism for searching the CN

New story: (3) (Robert, Sprint 9) Implement CN_crud.resolve (cn_api_crud.html#CN_crud.resolve)
Query metacat for objects of type systemmetadata where identifier = ID, then parse system metadata to extract object location (which MNs), construct response object and return to client

#351 - (Chad) Implement Metacat MN
#426 - (sprint 8) move forward to next sprint (likely to be closed early)

New Story: (Rob, Roger, Dave, lower priority that the log output for packager notification) Create a test harness for exercising the MN + CN APIs using the Java client

~~New story:~~ ~~#667~~ (Dave, sprint 8) Implement a VM and install the test harnesses to perform automated testing of the CN and MN services.

#633 Monitoring framework
#635- Move to next sprint
#636- Move to next sprint. Setup accounts and test, config remaining hosts etc
#637- Need to ensure that appropriate commands are part of the base install

#638 - Dupe of tasks under #640

#640 - Need to add same functionality to Metacat MN health API

Implement MN and CN health
- Define MN and CN health API
- Implement CN health API
- #641 Implement MN health API
Create an account for monitor ssh access to machines being monitored
- setup LDAP user for monitoring service: "dataone_monitor"
- restrict the scripts that can be executed by the account
- setup passwordless keys for access to the account from the monitor machine
Fix authentication bugs in D1Client
Remove old service type classes and switch metacat to use the new ones

#648 - static test VM, low priority (move to an enhancement / separate task). This could be part of hte VM setup for #667.

#650, #651 - harvesting process. Need to complete testing

Story: (5, Sprint 8, Chad) Metacat notification of downstream services that content has changed.
- metacat will generate log messages. (task-chad)Create a separate log file using Log4J ('RollingFileAppender' for ResourceHandler...add log statements with localIds of SM and metadata doc.) Create separate logs for replication and create().
- task: (Robert) implement a watcher for the log that will trigger the packager to generate packages for new content and store the state of operations against the log file (time, offset)