/20110125-NSFReview-Planning

25 January 2011

Attendees: Rebecca, Amber, Bill, Viv, Bob, Carol, John Kunze, Trisha, Bruce, Matt, Steph,
Todd, Steve, John Cobb

Feedback from EAB
1. What is DataONE?
2. Presentations need to be DYNAMIC
3. Make bird migration presentation relevant to DataONE
4. Personna story upfront
5. Get into information about the organization a lot later
6. Start with WHY, less about what, how , when
7. Think of a newspaper article, start with the story then get into the details

What is needed from this meeting:
1. Clear idea of what the presentations will include, including the common scenarios and the
data life cycle thread that will run throughout the presentations
2. Who is responsible for what and when it needs to be done

Draft of the the Overview presentation is available in the Presentation folder
https://docs.dataone.org/member-area/documents/management/nsf-reviews/nsf-review-february-2011/presentation-materials-for-nsf-review/current-working-copies-of-nsf-presentations-post-eab-review/

Follows the basics of:

Why
What
How
Progress
Process

CE Plans:
How DataONE is getting information from the community

Assessment Work
Integrated work across working groups (personas/scenarios)

How they are informing the community
How they are passing the information on to CI to inform the design of DataONE (no specifics)
* Design is influenced by CE issues. For example, the needs for MN to control inputs and replication affects design. The whole authentication infrastructure is driven in part by a need to engage the broader community. Minor thus far, but usability group has done some reviews of the pilot catalog.
Data management planning, including Best Practices

Need a graphic for the data life cycle - also agree on the terminology

Data creation/collection
Data deposition/acquisition/ingest
Data curation and metadata management (primary as well as composite/second generation datasets)
Data protection
Data discovery, access, use, and dissemination
Data interoperability, standards, and integration
data curation of composite/second generation datasets (combine with 3rd bullet?)
Data exploration, visualization, and analysis

Use active words in diagram to shorten:
Creating - Create
Depositing - Deposit
Describing - Describe
Protecting - Preserve
Discovering - Discover
Integrating - Integrate
QA/QC - Assure
Analyzing&visualizing - Analyze
Visualize
Cite

Using (can be visualizing, analyzing, etc)

FWIW: DCC (http://www.dcc.ac.uk/resources/external/data-life-cycle and http://www.dcc.ac.uk/resources/curation-lifecycle-model)
* Storage
* Use
* Authenticity and integrity
* Creation
* Disposal
* Retention
* Review
* Reuse

Lifecycle stages from the CCIT:
   -- see notes for tools at each stage at: https://repository.dataone.org/documents/Meetings/20101102-ABQ-AHM/CCIT/20101103_ITK_discussion.txt
1. Acquisition   create
2. Documentation   Document
3. Deposition   Deposit
4. QA QA
5. Analysis   Analyze
6. Visualization
7. Preservation
8. Citation
9. Disposition, decomission

Also see diagram at: http://libraries.mit.edu/guides/subjects/data-management/lifecycle.jpg

Scenarios: https://docs.dataone.org/member-area/documents/management/nsf-reviews/nsf-review-february-2011/presentation-materials-for-nsf-review/current-working-copies-of-nsf-presentations-post-eab-review/DataONEScenarios4.doc/view

A full persona (scenarios with bacnkground information on the user): https://docs.dataone.org/member-area/documents/management/nsf-reviews/nsf-review-february-2011/presentation-materials-for-nsf-review/current-working-copies-of-nsf-presentations-post-eab-review/SunPersona.docx/view

What is needed from CI:
Present CI from the viewpoint of the user
What is the Investigator Toolkit? (need some mockups of what it will look like)

From the user perspective
What tools and when

Member Nodes

Who (including those over the next couple of years)
Why
When

Initially chose MNs with tools such as Mercury, Morpho as places to start
pilot MN effort to incorporate options for data co-location with HPC faclities e.g. TeraGrid pilot. Status: Positive discussions at TG Froum (project governing body): proceeding with pilot effort; TG allocation request submitted on Jan 15, eval in March and planed implementation start on 4/1.

Replication Nodes
Process of buying 600-800 TB of storage (more likely 1.2 Pb total -- roughly 400TB per CN)
Compute Nodes
Simple, high-level diagram of the CI

Suggestion from Matt: highlight strengths of existing MNs (wrt scenario development)

Is the data portal part of the ITK or is it something else?
Three components:
CN
MN
ITK

But then talk about compute nodes and replication nodes -
Replication nodes are member nodes
Compute nodes are something different so not member nodes but CCIT hasn't talked
about this yet; purview of TeraGrid (John's note)
{ John Cob: Also to (partially) answer Bob's question, I think that one outcome of the EVA experience was that it showed the value in linking computing centric CI (Like TeraGrid) with datanets like DataONE and highlighted how short-term storage services such as TeraGrid's albedo and wide-area file systems do not meet the same needs as datanet's long term, well curated archives, but that there is the possibility of interoperable CI.

And that DataONE has (is) taking action on those opportunities.}