25 January 2011 Attendees: Rebecca, Amber, Bill, Viv, Bob, Carol, John Kunze, Trisha, Bruce, Matt, Steph, Todd, Steve, John Cobb Feedback from EAB 1. What is DataONE? 2. Presentations need to be DYNAMIC 3. Make bird migration presentation relevant to DataONE 4. Personna story upfront 5. Get into information about the organization a lot later 6. Start with WHY, less about what, how , when 7. Think of a newspaper article, start with the story then get into the details What is needed from this meeting: 1. Clear idea of what the presentations will include, including the common scenarios and the data life cycle thread that will run throughout the presentations 2. Who is responsible for what and when it needs to be done Draft of the the Overview presentation is available in the Presentation folder https://docs.dataone.org/member-area/documents/management/nsf-reviews/nsf-review-february-2011/presentation-materials-for-nsf-review/current-working-copies-of-nsf-presentations-post-eab-review/ Follows the basics of: * Why * What * How * Progress * Process CE Plans: How DataONE is getting information from the community * Assessment Work * Integrated work across working groups (personas/scenarios) How they are informing the community How they are passing the information on to CI to inform the design of DataONE (no specifics) * Design is influenced by CE issues. For example, the needs for MN to control inputs and replication affects design. The whole authentication infrastructure is driven in part by a need to engage the broader community. Minor thus far, but usability group has done some reviews of the pilot catalog. Data management planning, including Best Practices Need a graphic for the data life cycle - also agree on the terminology * Data creation/collection * Data deposition/acquisition/ingest * Data curation and metadata management (primary as well as composite/second generation datasets) * Data protection * Data discovery, access, use, and dissemination * Data interoperability, standards, and integration * data curation of composite/second generation datasets (combine with 3rd bullet?) * Data exploration, visualization, and analysis Use active words in diagram to shorten: Creating - Create Depositing - Deposit Describing - Describe Protecting - Preserve Discovering - Discover Integrating - Integrate QA/QC - Assure Analyzing&visualizing - Analyze Visualize Cite Using (can be visualizing, analyzing, etc) FWIW: DCC (http://www.dcc.ac.uk/resources/external/data-life-cycle and http://www.dcc.ac.uk/resources/curation-lifecycle-model) * Storage * Use * Authenticity and integrity * Creation * Disposal * Retention * Review * Reuse Lifecycle stages from the CCIT: -- see notes for tools at each stage at: https://repository.dataone.org/documents/Meetings/20101102-ABQ-AHM/CCIT/20101103_ITK_discussion.txt 1. Acquisition create 2. Documentation Document 3. Deposition Deposit 4. QA QA 5. Analysis Analyze 6. Visualization 7. Preservation 8. Citation 9. Disposition, decomission Also see diagram at: http://libraries.mit.edu/guides/subjects/data-management/lifecycle.jpg Scenarios: https://docs.dataone.org/member-area/documents/management/nsf-reviews/nsf-review-february-2011/presentation-materials-for-nsf-review/current-working-copies-of-nsf-presentations-post-eab-review/DataONEScenarios4.doc/view A full persona (scenarios with bacnkground information on the user): https://docs.dataone.org/member-area/documents/management/nsf-reviews/nsf-review-february-2011/presentation-materials-for-nsf-review/current-working-copies-of-nsf-presentations-post-eab-review/SunPersona.docx/view What is needed from CI: Present CI from the viewpoint of the user What is the Investigator Toolkit? (need some mockups of what it will look like) * From the user perspective * What tools and when Member Nodes * Who (including those over the next couple of years) * Why * When Initially chose MNs with tools such as Mercury, Morpho as places to start pilot MN effort to incorporate options for data co-location with HPC faclities e.g. TeraGrid pilot. Status: Positive discussions at TG Froum (project governing body): proceeding with pilot effort; TG allocation request submitted on Jan 15, eval in March and planed implementation start on 4/1. Replication Nodes Process of buying 600-800 TB of storage (more likely 1.2 Pb total -- roughly 400TB per CN) Compute Nodes Simple, high-level diagram of the CI Suggestion from Matt: highlight strengths of existing MNs (wrt scenario development) Is the data portal part of the ITK or is it something else? Three components: CN MN ITK But then talk about compute nodes and replication nodes - Replication nodes are member nodes Compute nodes are something different so not member nodes but CCIT hasn't talked about this yet; purview of TeraGrid (John's note) { John Cob: Also to (partially) answer Bob's question, I think that one outcome of the EVA experience was that it showed the value in linking computing centric CI (Like TeraGrid) with datanets like DataONE and highlighted how short-term storage services such as TeraGrid's albedo and wide-area file systems do not meet the same needs as datanet's long term, well curated archives, but that there is the possibility of interoperable CI. And that DataONE has (is) taking action on those opportunities.}