25 January 2011
Attendees: Rebecca, Amber, Bill, Viv, Bob, Carol, John Kunze, Trisha, Bruce, Matt, Steph,
Todd, Steve, John Cobb
Feedback from EAB
1. What is DataONE?
2. Presentations need to be DYNAMIC
3. Make bird migration presentation relevant to DataONE
4. Personna story upfront
5. Get into information about the organization a lot later
6. Start with WHY, less about what, how , when
7. Think of a newspaper article, start with the story then get into the details
What is needed from this meeting:
1. Clear idea of what the presentations will include, including the common scenarios and the
data life cycle thread that will run throughout the presentations
2. Who is responsible for what and when it needs to be done
Draft of the the Overview presentation is available in the Presentation folder
https://docs.dataone.org/member-area/documents/management/nsf-reviews/nsf-review-february-2011/presentation-materials-for-nsf-review/current-working-copies-of-nsf-presentations-post-eab-review/
Follows the basics of:
- Why
- What
- How
- Progress
- Process
CE Plans:
How DataONE is getting information from the community
- Assessment Work
- Integrated work across working groups (personas/scenarios)
How they are informing the community
How they are passing the information on to CI to inform the design of DataONE (no specifics)
* Design is influenced by CE issues. For example, the needs for MN to control inputs and replication affects design. The whole authentication infrastructure is driven in part by a need to engage the broader community. Minor thus far, but usability group has done some reviews of the pilot catalog.
Data management planning, including Best Practices
Need a graphic for the data life cycle - also agree on the terminology
- Data creation/collection
- Data deposition/acquisition/ingest
- Data curation and metadata management (primary as well as composite/second generation datasets)
- Data protection
- Data discovery, access, use, and dissemination
- Data interoperability, standards, and integration
- data curation of composite/second generation datasets (combine with 3rd bullet?)
- Data exploration, visualization, and analysis
Use active words in diagram to shorten:
Creating - Create
Depositing - Deposit
Describing - Describe
Protecting - Preserve
Discovering - Discover
Integrating - Integrate
QA/QC - Assure
Analyzing&visualizing - Analyze
Visualize
Cite
Using (can be visualizing, analyzing, etc)
FWIW: DCC (http://www.dcc.ac.uk/resources/external/data-life-cycle and http://www.dcc.ac.uk/resources/curation-lifecycle-model)
* Storage
* Use
* Authenticity and integrity
* Creation
* Disposal
* Retention
* Review
* Reuse
Lifecycle stages from the CCIT:
-- see notes for tools at each stage at: https://repository.dataone.org/documents/Meetings/20101102-ABQ-AHM/CCIT/20101103_ITK_discussion.txt
1. Acquisition create
2. Documentation Document
3. Deposition Deposit
4. QA QA
5. Analysis Analyze
6. Visualization
7. Preservation
8. Citation
9. Disposition, decomission
Also see diagram at: http://libraries.mit.edu/guides/subjects/data-management/lifecycle.jpg
Scenarios: https://docs.dataone.org/member-area/documents/management/nsf-reviews/nsf-review-february-2011/presentation-materials-for-nsf-review/current-working-copies-of-nsf-presentations-post-eab-review/DataONEScenarios4.doc/view
A full persona (scenarios with bacnkground information on the user): https://docs.dataone.org/member-area/documents/management/nsf-reviews/nsf-review-february-2011/presentation-materials-for-nsf-review/current-working-copies-of-nsf-presentations-post-eab-review/SunPersona.docx/view
What is needed from CI:
Present CI from the viewpoint of the user
What is the Investigator Toolkit? (need some mockups of what it will look like)
- From the user perspective
- What tools and when
Member Nodes
- Who (including those over the next couple of years)
- Why
- When
Initially chose MNs with tools such as Mercury, Morpho as places to start
pilot MN effort to incorporate options for data co-location with HPC faclities e.g. TeraGrid pilot. Status: Positive discussions at TG Froum (project governing body): proceeding with pilot effort; TG allocation request submitted on Jan 15, eval in March and planed implementation start on 4/1.
Replication Nodes
Process of buying 600-800 TB of storage (more likely 1.2 Pb total -- roughly 400TB per CN)
Compute Nodes
Simple, high-level diagram of the CI
Suggestion from Matt: highlight strengths of existing MNs (wrt scenario development)
Is the data portal part of the ITK or is it something else?
Three components:
CN
MN
ITK
But then talk about compute nodes and replication nodes -
Replication nodes are member nodes
Compute nodes are something different so not member nodes but CCIT hasn't talked
about this yet; purview of TeraGrid (John's note)
{ John Cob: Also to (partially) answer Bob's question, I think that one outcome of the EVA experience was that it showed the value in linking computing centric CI (Like TeraGrid) with datanets like DataONE and highlighted how short-term storage services such as TeraGrid's albedo and wide-area file systems do not meet the same needs as datanet's long term, well curated archives, but that there is the possibility of interoperable CI.
And that DataONE has (is) taking action on those opportunities.}