Attendees: Rebecca, Bruce, Bob, Dave, Mike,John Cobb, Amber, Suzie, Bertram,Deborah, Bill


Regrets: Viv (maybe)



DataONE LT Call:  9am AK/10am PT/11am MT/noon CT/1pm ET

1.  Please join my meeting, Sep 13, 2013 at 11:00 AM MDT.
https://www1.gotomeeting.com/join/895113193

2.  Use your microphone and speakers (VoIP) - a headset is recommended. Or, call in using your telephone.

Dial  1 (213) 493-0605
Access Code: 895-113-193
Audio PIN: Shown after joining the meeting

Meeting ID: 895-113-193

GoToMeeting®
Online Meetings Made Easy®

Not at your computer? Click the link to join this meeting from your iPhone®, iPad® or Android® device via the GoToMeeting app.

We will also use the epad: http://epad.dataone.org/2013Sep13-LT-VTC 


September 13: CI WG plans for years 6-10 (Dave)
September 20: Sustainability & Governance WG in years 6-10 (Trisha, Bill)


 
If you have items to add, let me know.

Agenda for 2013-09-13

1.  CI Report (Dave)

- Preparations for CCIT meeting. Many topics to cover, a few key items:
  - Work through process of setting up a MN or two, following available docs
  - Resolve a few issues related to content mutability and identity management
  - Planning for changes to system metadata management (MN authority)
  - Indexing, auditing, dashboard
  - Review, adjust plans for remainder of y5

- Good discussion about MN documentation and how it can flow between the various locations. One sticking point is the artificial barrier between doc writers and the public web site.

- Generally on track with priorities identified early in the year

- Lengthy discussion about overall development process and how to further streamline activity
  - Sticking with two week sprints grouped into approximately two month blocks
  - More emphasis on ensuring goals for sprint and block are met before taking on additional work

- Quick look at dashboard prototype
  - http://mule1.dataone.org/prototype/dashboard
  - Using numbers from stage environment, not production
              past 100 years? why such a short window? : v }: this may be the coverage rnge of data and not data or inclusion in the DataONE meta-collective
              but seriosuly, is the download column live? No this is a view to staging environment. We hope to produce ths data for Prod. nodes in the future, soon.
              if a node is temporarily down, how do I see that on the d-board? Do the colors change form blue to something else?  Perhaps system "button" will show this.

            on map for some reason, Replication MN's co-locating with CN's are not listed.

2. a. CI WG plans for years 6-10 (Dave)

Primarily from the second planning meeting held in Knoxville, July 2013, where plans developed during the May meeting in Albuquerque were reviewed and key aspects selected with consideration of the anticipated resource constraints.

Ongoing CI activity for DatONE falls into three broad categories:

1. Maintenance of the status quo
2. Addition of new features necessary to maintain relevance of DataONE to target audiences
3. Research and development activities that are high risk, but high potential value

The first category of activities is necessary for the core of DataONE to remain operational into the future. The second category represents a progressive update and enhancement of the infrastructure to ensure that the infrastructure provided by DataONE remains relevant as new expectations emerge, perhaps driven by other systems or infrastructure that overlap or align with the goals of DataONE. In general, the R&D activities (the third category) are predominantly excluded from the renewal proposal and instead will be undertaken through additional research proposals that leverage the DataONE infrastructure.

Several topics were identified as within scope and achievable during a five year period and with the anticipated budget. These include (in order of priority / importance):

More detail on each topic follows.

Maintain the Core Infrastructure

Maintaining the current software implementations with no or minimal feature updates. Includes bug fixes, updates necessary to track changes to third party dependencies, security updates, hardware replacement / maintenance, system administration, documentation.

Note that the cost of maintenance will generally increase as more products and features are added to the DataONE repertoire.

Duration: Y6 - Y10

Anticipated resource requirements include:

Add Provenance Support to DataONE

The goal is to extend the current DataONE metadata capabilities to: 

(1) Incorporate additional provenance metadata, in the form of 
    (a) W3C-PROV and 
    (b) DataONE-PROV extensions to W3C-Prov (= D-PROV)
and allow it to be ingested & stored as system- or science-metadata at (some? all?) MNs.

(2) Index that metadata at CNs, in order to … 

(3) Implement “smart” (provenance-enabled) searches and queries.

From a users POV, (3) will provide the main added value: 
Enhanced data discovery:
find data, based on how it was created, using what tools, using what input datasets or types, by whom, etc.

Showing linked data:
the “social life” of data, i.e., data in context

Supporting provenance can be generalized to "supporting somewhat arbitrary relationships between uniquely identified content". Adding provenance support will also enable support for linking publications, data, and workflows.

Duration: Y6 - Y10

Anticipated resource requirements include:

Enhanced Reporting Capabilities

The initial implementation of this activity will be completed before the end of Y5. Additional updates and enhancements are expected to be made over Y6 - Y10. Hence it is expected that no additional resources need be allocated for this work during Y6 - Y10 since update and maintenance activity will be included under general Infrastructure Maintenance.


Enabling Measurement Driven Discovery through Scalable Semantic Annotation
(Measurement Search)

This activity will produce a semantically-driven search and discovery system that extends the current DataONE query subsystems to allow researchers to search for data sets that contain specific measurement types.  Current searching in DataONE focuses on fielded and full-text metadata search but that does not specifically enable precise queries of measurements.  This is mostly because the metadata corpus contains uncontrolled descriptions of measurements (e.g., variable names, descriptions, units, etc.), which makes it impossible to find all of the data sets containing a particular measurement type.

This activity relies on some of the capabilities required for supporting provenance tracking in DataONE, most notably, some mechanism for recording statements associated with PIDs.

Duration: Y6 - Y8

Resources:

Supporting Data Subsetting and Extraction Services
(Registration of data subset/extraction services)

Provide metadata support for registration of data extraction and subset services provided by member nodes and the linkages to see what services can be applied to data sets resulting from search operations.  

Deliverable: Users are able to use the DataONE GUI and API search interfaces to discover data, find out what subsetting/extraction services are available for those data sets, invoke a subsetting operation on discovered data, and deliver that data to analysis, manipulation, and visualization tools that understand the relevant data format.  

Duration:  1 year (likely Y6 - Y7, revisit Y9)

Resources:

Slender Node

A Slender Node (SN) is a mechanism that enables the equivalent of Tier 1 Member Node services (publicly readable content) to be achieved with data repositories that do not expose the DataONE MN service APIs, but do currently support another widely adopted data access service protocols such as OAI-PMH or Catalog Services for the Web (CSW). There are at least two ways to achieve this:

Implement a gateway service that is able to operate as a facade to the OAI-PMH or CSW service, providing a view that is compliant with DataONE Tier 1 MN services. The gateway may be co-located with the SN or may be operated at a separate location. 

Implement SN service specific functionality in the Coordinating Node software, enabling the necessary Tier 1 operations (e.g. synchronization) to be performed by the CN. Note that this is conceptually equivalent to locating the gateway described in #1 within the CN software stack.

Duration: 1 Y (early <= Y6 - Y8)

Resources:

New MN Software Stacks

Implementation of new MN software stacks is an expensive process that requires significant testing and iteration, especially where the conceptual mode of operation of the target repository stack is not well aligned with the functional requirements of DataONE.

Where possible, this activity should be passed on to others responsible for the particular software stack, and DataONE effort directed at providing technical guidance and supporting testing and deployment of the new stack.
Q: How will DataONE incorporate exernally developed and contributed SW stacks? 
    Move into DataONe Repo or external?
    How to integrate core DataONE-II and external contributory efforts
    Need to chart an plan/strategy for SW maintenance for contributed SW - try to avoid taking on a mortgage for contirbuted SW.

It is reasonable to expect that 1 FTE will be required to shepard and otherwise guide and manage new implementations of MN software being developed by third parties.

This activity is suitable to supplemental and incremental add ons to DataONE. It consists of augmenting the currently supported set of MN SW stacks with additional stacks that can enable (or ease) the recruitment of additional MNs using those stacks. This would augment the currently support Metacat, GMN(Python), and Mercury implementations of the DataONE API. 

The primary effort will be in the actual implementation of the stack. The implementation will include the actual augmentation of the SW stack and may include pilot MN implementations to test and validate the stack. Each stack could be implemented pretty much independently, in parallel or sequentially, according to the availability of resources (financial and effort) and  priority. In addition there are three “umbrella” activities that would have be coordinated with DataONE “glue the efforts together: overall management of the stack implementations, managing integration with DataONE core functions, including scheduling tasks to available DataONE Core-CI available effort hours, and coordination of long-term maintenance and upgrades to deployed stacks. 

Duration: Y6 - Y10

Resources:

How CI WGs would function - check the threads above (maintenance, provenance, semantics)
Perhaps team leaders for driving the projects
Working Group members: half a dozen for each topic - but budget allows for 10 total (8 WG members plus 2 co-leads) 

Developer team will be very critical - need to be very specific about any rehires for the follow-on years
Deborah: Ben is a good example of this


2.b. Budget (Rebecca)
$10 M Budget:
-       Amber Budden (1 FTE)
-       Rebecca Koskela (1 FTE)
-       Dave Vieglais (0.75 FTE)
-       Developers (3 FTE)
-       Acct (1 FTE)
-       Web designer (0.5 FTE)
-       Equipment ($200 K per CN)
-       Supplies
-       UCSB (Matt 2 months, sys admin)
-       UTK (Susie 2 months, Bruce 1 month, sys admin)Moyers? included or not?
-       UCOP (Trisha 1 month) 
-       PSC
-       Travel
-       F&A
 
$13.3 M Budget:
-       Add developers (3 FTE)
 
Not budgeted:
-       1.15 FTE UI
-       1 FTE Domain expert = 1 ontology expert

Trisha: Suggestion of switching web designer for UI person (but does the title 'designer' include web maintenance?yes  possibly more broad?) If you want a web production person that is different that design. Agreed - think that was what was intended (with some design skill thrown in). I think those are different enough to approach separately.


3.  White paper outline and writing assignments (Bill)
White paper outline: http://bit.ly/17ZoICP
25-page Project Description from last proposal 
(file is called _Pages from DataNetONE_fullproprosal.pdf): http://bit.ly/1aHBqMu

Hard deadline for white paper: October 21
Can review at LT face-to-face meeting - Bill at NSF on the 24th of October so NSF needs document by October 22

Next phone call important for establishing who does what with the white paper

4.  Around the room (all)


John Cobb: Nothing to add


Bob:  Two Best Data Management Practices Webinars were held this week, in conjunction with NASA.  One on Tabular data (126 participants) and one on Geospatial Data (116 participants).

Suzie: Robert Olendorf will be continuing in the SCWG as the replacement for Miriam Blake which maintains the LANL presence. 
The UA/SC Joint Working Group meeting has been scheduled for May 13-15 (with travel days of May 12 & 16) at Knoxville, TN. 

Mike: Nothing to report.

Trisha: nothig to report

Deborah:  Semantics working group regular meeting call right after this call.  ISWC presentations videos, posters, etc in process.   Announcement of Mark Schildhauer taking over as co-lead with Deborah and replacing Jeff H as co-leader of working group.