#persist Metacat upgrade strategy =================== Items needed to be upgraded ------------------------------------------ 1. Identifier upgrade 2. Generate access polidy 3. Generate replication policy 4. Generate system metadata 5. Generate ORE maps Identifier upgrade ------------------------- In general, identifier upgrades happen automatically, and are the joined docid plus rev strings for existing content. This guarantees uniqueness, and provides a direct correspondence to existing local identifiers. However, some content may also get DOIs. How and to whom do we assign DOIs? Existing content a. Local content (home server = 0) b. Replicated content b1. LTER b2. SanParks b3. PISCO b4. Brazil b5.GBIF b6. Palmyra b7. ESA c. Non-replicating, external metacat servers c1. Taiwan c2. etc. New content a. DataONE API b. Metacat API ORE Generation ------------------------ We want to generate package descriptions for EML-described data so that the existing EML-sense of a package is maintained in DataONE. The strategy for this will differ along two axes -- whether the EML is local or replicated, and whether the data is stored in metacat or just referenced via URI. We also need a strategy for how to handle new content after the upgrade occurs. See discussion here: http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5522 When to do this upgrade? 1) At upgrade to 2.0.0, or 2) When D1 MN status is turned on? -- Decision: (2) create flag in metacat config, only do ORE gen when metacat is a MN -- as a CN: never generate anything (need a flag for acting as a CN? existing node type field?) (need a CN-side check that new nodes don't register as new CNs) Data stored in Metacat (ecogrid URI) -- yes Data referenced via other URL -- Yes, download & save object if 1) resolvable, 2) matches type -- Otherwise, just include URL in ORE map Existing content a. Local content (home server = 0) -- Decision: generate ORE (assuming conditions above) b. Replicated content b1. LTER (to be MN) b2. SanParks (to be MN) b3. PISCO b4. Brazil b5. GBIF metacatdev.gbif.org/knb/servlet/replication, vs. OAIPMH b6. Palmyra b7. ESA b8. iEcolab (Spain) c. Non-replicating, external metacat servers c1. Taiwan c2. etc. New content a. DataONE API -- Decision: do nothing b. Metacat API -- Generate ORE, iff the metacat instance has D1 turned on Sync seq 1. KNB home content 2. LTER home content 3. SanParks Home content 4. KNB replicated content (any OREs that already exist don't get generated for existing MNs) (for other KNB rep nordes, generate ORE maps with KNB as authoritative) 5. LTER replicated content Converting a node from KNB to D1 -------------------------------------------------- 0. Generate ORE (for all content that doesn't already have it) and sync all KNB content 1. Turn off LTER knb rep 2. Register LTER as MN, sync is off 2a. (LTER avoid gen of ORE for any for which that graph exists) 3. On KNB, change Auth MN to LTER for all LTER replicas and ORE maps 4. Turn on LTER sync -- when CN discovers that an object is a replica, CN triggers sysmetaChanged event at LTER Generating Replication policies --------------------------------------------- Updating access policies ------------------------------------ Regarding RightsHolder and AuthoritativeMN, see: http://bugzilla.ecoinformatics.org/show_bug.cgi?id=5523 Checking if data downloaded is legit ----------------------------------------------------- switch (eml-type): case text/plain: if (isText() && !isHTML()) then archive() case text/csv: (or other delimited types) if (isText() and isValidCSV()) then archive() case text/html: if (isHTML()) then archive() case application/excel or application/msaccess: if (isBinary()) then archive() case image/*: if (isValidImageFormat()) then archive() (maybe just isBinary()?) ... case netCDF: if (isNetCDF()) then archive() case text/pdf: ... default: break; alternate if (text/html && isHTML()) then archive; if (!text/html && !isHTML() then archive;