Discussion of new ORNL MN - DataONE Collaboration 

Participants:  Dave Vieglais,  Giri Palanisamy, Ranjeet Devarakonda, Jim Green, Rebecca Koskela, and Laura Moyers, Matt Jones

Not able to Participate:  Bruce Wilson, Chris Jones

Teleconference numbers:
Toll Free Number:  866-703-9626
Pass Code:  767574


Goal:  Build on existing services and practices of ORNL Data Centers to enable more efficient and effective collaboration with DataONE
CCIT Discussions in Santa Barbara (September 2013):
Discussion of plans to support two significant changes intended to address concerns raised by potential member nodes:
1.  Transfer of system metadata control to MNs
2.  Supporting mutable content
- Possible technical architecture and list of changes that will be needed in CN and MN

System metadata - currently the CN holds the authoritative copy of system metadata, which is inconvenient for MNs who need to change their system metadata as they have to contact a CN, request a change, etc.  The desired change is that the MN be responsible for system metadata so that changes may be made locally and the CNs can pick up changes on synchronization and propagate changes out from there.  The MN will have the authoritative copy of the system metadata.  A big concern was that access control rules had to be coordinated/requested through CNs.

Mutability - currently, every object in DataONE is considered to be immutable.  If a minor change is needed (spelling, etc.), the current design requires that the "old" object be "deleted" and the "new" object be "added".  The "old" object shouldn't really be deleted, for reproducibility in particular.  After examining several options, a solution has been proposed that MNs may change content provided they are scientifically-insignificant.  "Scientifically-significant" becomes a policy decision rather than a technical decision.  We will be adding a mechanism to the CNs that will track changes over time to allow a user to know if they have most current version, or if there have been changes to a previously accessed data object since that earlier access.  

If there is a change in system metadata, say, the MN will make desired changes, and when synchronization occurs with CN, the CN will accept the "new" system metadata and keep a copy of the previous version of the system metadata.

Every object requires a checksum (system metadata, science metadata, data object).

What about situations where we just have system/science metadata (another entity retains the data object)?  KNB and others have this situation; DataONE is ok with this.

Discussion on links to a data object not retained by DataONE: not necessarily a good idea to include the url for the data object in the metadata, as often links break.  Idea: maybe we could show the link even if the link is not resolvable.  Not exactly within scope; while a nice idea, it could perhaps be better resolved another way (resource maps, MN ensure that their links are valid, etc.)

"Only results with data" checkbox on search 

What can we do to add more MNs?
We can reduce barriers to make becoming a MN easier (perhaps less stringent requirements?).  Caution: we need to be careful to not eliminate essential characteristics (reliable access, preservation, etc.).  What about scaling of interoperability?  

DataONE will pick up on changes made to a MN's data/metadata now (via checksums) where previously the did-it-change focus was on the DOI.  

Mutability issues with USGS Topo Maps, for example - there are ~300K metadata records, each with a link to the data object, which will be updated often.  It would be very difficult to have a distinct metadata record for each iteration of the data (high volume).
Options to explore: 
    CNs: index the data object's URL
    proxy object, may be part of resource map or another object
    
    when harvesting science metadata documents from sites, some can be dropped which have already been registered with the CN (i.e. a deleted object); Question: if you have 100 objects one day and 90 the next, can the missing 10 be obsoleted (at the CN)?  No, if the missing 10 are really deleted, the MN should obsolete those objects.  Their DOIs will be archived.  
    
AHM coming up soon, probably the last opportunity to propose changes (for the time being).  A tentative date for changes (such as those discussed here) would be late-winter/early-spring.

ORNL can offer up some of the new potential MNs as "guinea pigs" to test changes to system.

Perhaps talk again after the AHM.