Attendees: Rebecca, Dave, John Kunze, Viv, Amber, Steph, Bruce, Todd, John Cobb, Bob Cook


Regrets: Carol Tenopir, Mike Frame, Steve Kelling, Bertram Ludäscher, Trisha Cruse

http://epad.dataone.org/20110311-LT-VTC
 

  Agenda for 2011-03-11


1.  Review response to the NSF Panel's recommendations (SEE BELOW ON THIS EPAD)
Recomendations can be found here:

 https://docs.dataone.org/member-area/documents/management/nsf-reviews/nsf-review-february-2011/D-ONE-report%20-2.pdf)

Responsibilities for specific recommendations are included in last week's
minutes

https://docs.dataone.org/member-area/committees/management-team/meetings/2011/2011Mar04_LT_VTC_Minutes.docx

2. Planning for July DUG (Budden)
Questions for leadership team:
Primary stakeholder groups to include in DUG meeting (ESIP co-location constraint of 25 rooms)
Mechanisms of solicitation for additional DUG members
Funding for travel to the DUG

3. Check mailing list content:
https://docs.dataone.org/member-area/project-information/d1_users.xls

4. Around the Room 
John:  TeraGrid allocation review panel met this week. We expect to hear a  formal response to the proposal soon with the allocation (if any was  awarded) would start April 1.

Viv:  Attended the NSF GeoData2011 Meeting last week. A report will be issued  in the next 6 weeks. DataONE was mentioned and referred to frequently.  Meeting focused on data lifecycle, data citation, and communication  strategies in geoinformatics. Representatives from NSF, USGS, NOAA,  NASA, universities, state agencies. 



RESPONSE TO PANEL RECOMMENDATIONS
CyberInfrastructure - Panel Recommendations:
We agree and are currently evaluating technical options for managing the synonyms, hyponyms, and polynyms extracted from the multiple types of science metadata retrieved from the Member Nodes as part of the process for generating the search indexes. This topic is an active area of effort for the Semantics Working Group which is engaged in identifying the best practices and appropriate technologies for inclusion in DataONE's infrastructure.

Initial evaluations will utilize the Mercury infrastructure for search which already provides support for inclusion of multiple thesauri, providing a basis for end-user and programmatic search through the OpenSearch API interface of the SOLR and Lucene engines upon which Mercury is based.  DataONE has also been working with BioPortal and has a test implementation of their ontology repository.  We are working on using the repository itself as a useful tool for the DataONE community, as an specialized type of information repository about relevant ontologies and thesauri.  We will use some of these semantic information sources moving forward to enable semantic search, both for synonyms, broader topics, and narrower topics.  The semantic search can also be used to expand the existing "More like this" functionality that exists in DataONE. As with other aspects of this project, DataONE cannot possibly  generate all of the potential ontologies that are needed, but DataONE can work with the communities that are generating them, use the ontology  repository to highlight the areas which need coverage and  where  ontologies are in conflict for the definition and organization  of  concepts.

Progressive improvement in support for semantic search and semantic integration are important targets for the ongoing development of DataONE infrastructure and are targeted for implementation following roll out of the initial core infrastructure towards the end of 2011.

Discovery of available facets and the range of values represented by those fields is valuable to  clients as it provides circumscription  information for the set of records being examined. The Mercury web interface and other components  such as the FUSE file system driver rely  heavily on the faceting capabilities of the SOLR and associated Lucene  indexes that are  populated using system and science metadata objects.  Introspection services for discovery of facets and the range of values  they represent  are available through the SOLR interface which in turn  are to be exposed  as part of the DataONE APIs. Ease of access to  enumerated values from  indexed fields is important for basic quality  control operations such as building and verifying controlled  vocabularies which can in turn  provide more reliable association with terms defined in domain  ontologies.

We recognize the importance of graphical interfaces for usability and they will be part of the public deployment where appropriate.  The development process requires that the underlying functions first be built and tested. Command line interfaces are by far the most efficient means for developers to work in a test-driven development best practice.  Those command line interfaces will also be important to a significant user segment focussed on automation of data access and manipulation.  

As the core services stabilize and resources become available, additional attention is to be directed towards the design and building out of graphical user interfaces for various adminsitrative and user  tools with  the general goal of  increasing the audience to which such tools will be useful.  Much of that work remains to be done, but is  made much simpler by having a robust and well-tested set of API's. Usability assessment will be an important part of all user interface design work, especially for the more widely disseminated investigator toolkit components. Much of the interaction with DataONE infrastructure will be performed through components of the investigator toolkit that have been extended to support the DataONE service interfaces. As this adaptation typically occurs at a low level in the programatic sense, the end user will typically continue to use the tools in essentailly the same manner as previously, though will be able to talke advanatage of the increased service capabilities and breadth of data access offered by the DataONE infrastructure. We anticipate that workflow tools environments such as VisTrails and Kepler will offer some of the richest user interactions with the infrastructure and these tools also offer significant benefits back to other usesr of DataONE infrastructure through the capture of provenance information assoicated with workflows and their analytical outputs.


Staffing and Project Management - Panel Recommendations:
DataONE is initiating joint-development with other projects as appropriate. For example,
a meeting has been scheduled for USGS developers and the DataONE developers to
discuss  a joint development approach related to the USGS CDI Data Uploader tool  and the DataONE Investigator Toolkit. The anticipated outcomes are: 
1.) Identify components needed for each applications 
2.) Identify resources required 
3.) Identify resources available for required components 
4.) Develop a joint development time line 
5.) Establish project working relationships

Another  strategy for dealing with this issue is to make sure that documentation  is kept current so there will be minimal effort to add new developers  to the project.

This  is a risk that DataONE has already identified.

Risk #105: Member time constraints -- "stretched too thin"
Scheduling  of tasks needs to consider the amount of time that can be contributed  by various project participants, and care should be taken to avoid the  mistake of equating enthusiasm with actual availability.
Mitigations:
In  addition to these mitigation strategies, we also ensure information  generated by volunteers is captured so the knowledge is not lost if lose  the volunteer. This also serves to give recognition to the volunteers.

Volunteers  are not necessarily unreliable.  Some of the working groups have chosen  to have members for one or two meetings rather than the life of the  working group. These members are chosen for their expertise to address  specific topics.  The goals of working groups may change with time.  It would, therefore, be appropriate to transition in or out individual members of working groups.

We  agree that project management consists of attempting to achieve the  goals of the project while managing the classic triple constraints of  cost, time and scope. We have four risks on our Risk Register related to  this:
{Add mitigation strategies} plus slide from review 
The existing mitigation strategies for these include reducing the scope or increasing the time to get the work done.

Network Coordination - Panel Recommendations:
See response below under DataNET Federation

NEED A RESPONSE FOR THIS ONE _ HOW TO GET A MATRIX IN LESS THAN ONE DAY
See Dave's response at bottom

Strategic and Longer Term Planning - Panel Recommendations:
We agree. Logging and recovery are a core part of the DataONE design and API. A part of our testing plan for Member Nodes and Coordinating  Nodes includes handling network segmentation, abrupt loss of a node,  and disk failures.  

Extensive operation logging and reporting capabilities are part of the DataONE service interface design and are intended to support core service monitoring and overal infrastructure state of health reporting as well as reporting on content access for individual users, by Member Node, and through other facets.

Support for transactions in the federated DataONE infrastructure varies,  though  in general operations are designed to be sufficiently granular  so that  they fully complete or fail, so that objects or operations are always at a defined state. Sequential operations such as the  synchronization of a Member Node with a Coordinating Node keep track of state at a level  suffciently granular that should the overal process fail due to network outage for example, the overall operation will continue from the last valid step in the process. 

See response below under DataNET Federation

Learning objectives have been defined for the majority of current DataONE education activites, including The ScienceLinks PhD program; Information Sciences courses at the University of Tennessee  and the Summer Environmental Information Management Institute run at  the University of New Mexico.  The Data Curation Education in Research Centers (DCERC) awarded in collaboration with the Data Conservancy, has both research areas and course selections identified (for PhD students at the University of Illinois and Masters students at the University of Tennesse).  Learning objectives are being developed.

For example, the ScienceLinks PhD program the outcomes are defined as:
Six doctoral graduates who will 
Development of science data and  information curriculum
Development  of a model of active science and data librarianship that combines the  best of research, teaching, and professional practice and that  incorporates from an early stage large national science data initiatives  and science organizations.

Goals / Objectives for the Information Sciences courses include:
Outcomes from one session of the three week Graduate Summer Insitute include:
Knowledge:
Skills:
Attitudes:
We agree that educational use of the data early in the project is important and we have elements in our project plan which address this.  Following  the public release of DataONE, we are planning additional workshops   with educators, scientists and DataONE partners focused on analyzing   data for ecological purposes.  In preparation for this, we are  exploring  the feasibility of engaging a graduate intern to develop data  driven exercises. These exercises would be expanded into educational  modules flexible enough to suit various educational levels.

In addition to plans for after the public release, current  Graduate Courses and Workshops involving DataONE institutions and collaborators include practical application of search  and retrieval processes for the utilization of existing data sets within  targeted research exercises.  These queries will be reliant on  associated metadata, and stress the importance of well-documented data.  Inclusion of NEON’s Education Director, and others whose work involves  substantial involvement in education, in the CEE Working Group helps to  ensure that education applications are regularly discussed. Structured  feedback from workshop sessions will be provided to the CCIT.



Assessment - Panel Recommendations:
 
The Usability and Assessment Working Group, in conjunction with EVA Working Group,  will design a test comparing the processes and procedures for data  ingest and data access using DataONE and at least one other system.  
 
The  test will present a set of realistic, but controlled, tasks to  environmental scientist subjects to complete. Their familiarity with the  systems and with similar data related tasks will be measured in a  pre-task questionnaire. The steps they take and time needed to complete  the tasks will be recorded, as will their interactions with the system  during the tasks. A post-task questionnaire will measure their  perceptions of the ease of use and success of the tasks with the 2  systems. To control for learning, half of the subjects will begin their  tasks on DataONE and half will begin on the alternate system. 
 
This test will compare the systems using the following metrics:
1.     Time required to do set tasks
2.      Ease of use (perceived ease measured through post-task questionnaires,  plus observed ease measured through monitoring keystrokes and reactions  captured during the tasks)
3.      Success at completing the tasks (perceived success measured through  post-task questionnaires and system success judged by the researchers)
4.     Perceived value of each system (measured by post-task questionnaires)
 
Timeline:  Task design: 2011 and first quarter 2012; Pilot testing: second quarter  2012. Test and analysis: third and fourth quarters 2012.

See response below under DataNET Federation


DataNET Federation
The following three recommendations have been grouped together because they all relate to the DataNet federation.

Network Coordination - Panel Recommendations:
 
Strategic and longer-term planning - Panel Recommendations:
Assessment - Panel Recommendations:
Acknowledge why federation is important to DataONE (preamble)
The DataONE Cooperative Agreement calls for
 A.   Kick-Off Meeting. During the first year of the project, NSF will host a  DataNet program kick-off meeting to identify mutual interests and   opportunities for the federated development of shared scientific  interests, technological challenges, and community engagement goals. The  attendance of Key Personnel from each of the projects is required. The   structure of the kick-off  meeting and subsequent semi-annual Program Review meetings will enable  DataNet awardees to share best practices, explore solutions to shared   problems, develop a shared governance structure across DataNet sites  and  facilitate the development of interoperable technical solutions  and  standards. Exceptions to required participation must be approved by the NSF Program Director.

A DataNet Federation Kick-Off meeting would be an ideal venue for 
DataONE would be happy to work with NSF and the Data Conservancy to draft an agenda for a DataNet Federation Kick-Off meeting.

=======================================================================
In the review, we said that the next 3 Member Nodes would be:
These were said to be "likely"
Dave's response:
A current list of candidate member nodes is available at:

http://mule1.dataone.org/OperationDocs/membernodes.html

The selection criteria we have been working with include:

1. Size and visibility of community represented by the candidate Member Node

2. The collections are significant or enable new science or both
* Fills significant gaps in the content available through DataONE
* The data are unique data in the broader community
* Collections are strong in breadth, depth, or both

3. The candidate brings significant contributions to the DataONE resource base, including:
* Strategic partnerships
* Professional expertise in managing data, computing, administration, etc.
* Synergistic services (e.g., TeraGrid)
* New funding streams or other sustainability enhancing resources
* Technical resources such as storage capacity, bandwidth, and processing power
* Compatibility with DataONE services, minimizing cost of deployment

4. The candidate adds diversity to the collective membership of DataONE such as through:
* Geographic diversity: a new state or region, new country, new continent
* Underrepresented group
* Linguistic and cultural diversity
* Different type of institution
* Diversity of funding sources

5. The candidate offers compliance with best practices and standards, including:
* Quality assurance
* Data sharing policies
* Metadata creation
* Security

Bruce:
From my perspective, there was also an element of the anticipated ease  of integration with these next three member nodes, in terms of the  technical difficulty, overlap with existing technology, and the  prospective member node's understanding of the DataONE infrastructure  and methodology.