CyberInfrastructure - Panel Recommendations:
Support  for semantic search and semantic integration are important targets for  the ongoing development of DataONE infrastructure and are targeted for  implementation following roll out of the initial core infrastructure towards the end of 2011.

The  Mercury infrastructure for search already provides support for  inclusion of multiple thesauri, which provides a basis for end-user  searching and searching through the OpenSearch API interface that is  part of the SOLR and Lucene engines upon which Mercury is based.   DataONE has also been working with BioPortal and has a test  implementation of their ontology repository.  We are working on using  the repository itself as a useful tool for the DataONE community, as an  specialized type of information repository about relevant ontologies and  thesauri.  We will use some of these semantic information sources  moving forward to enable semantic search, both for synonyms, broader  topics, and narrower topics.  The semantic search can also be used to  expand the existing "More like this" functionality that exists in  DataONE. As with other aspects of this project, DataONE cannot possibly  generate all of the potential ontologies that are needed, but DataONE  can work with the communities that are generating them, use the ontology  repository to highlight the areas which need coverage and where  ontologies are in conflict for the definition and organization of  concepts.

Discovery of available facets  and the range of values represented by those fields is valuable to  clients as it provides circumscription information for the set of  records being examined. The Mercury web interface and other components  such as the FUSE file system driver rely heavily on the faceting  capabilities of the SOLR and associated Lucene indexes that are  populated using system and science metadata objects. Introspection  services for discovery of facets and the range of values they represent  are available through the SOLR interface which in turn are to be exposed  as part of the DataONE APIs. Ease of access to enumerated values from  indexed fields is important for basic quality control operations such as  building and verifying controlled vocabularies which can in turn  provide more reliable association with terms defined in domain  ontologies.

 We recognize the importance of graphical interfaces for usability and  they will be part of the public deployment.  The development process  requires that the underlying functions first be built and tested.   Command line interfaces are by far the most efficient means for  developers to work in a test-driven development best practice.  Those  command line interfaces will also be important to a significant user  segment focussed on automation of data access and manipulation.  

As  the core services stabilize  and resources become available, additional  attention is being directed  towards the design and building out of  graphical user interfaces for various adminsitrative and user tools with  the general goal of  increasing the audience to which such tools will  be useful.  Much of that work remains to be done, but is made much  simpler by having a robust and well-tested set of API's. Usability   assessment will be an important part of all user interface design work,   especially for the more widely disseminated investigator toolkit   components.

Staffing and Project Management - Panel Recommendations:
DataONE is initiating joint-development with other projects as appropriate. For example,
a meeting has been scheduled for USGS developers and the DataONE developers to
discuss a joint development approach related to the USGS CDI Data Uploader tool and the DataONE Investigator Toolkit. The anticipated outcomes are: 
1.) Identify components needed for each applications 
2.) Identify resources required 
3.) Identify resources available for required components 
4.) Develop a joint development time line 
5.) Establish project working relationships

Another strategy for dealing with this issue is to make sure that documentation is kept current so there will be minimal effort to add new developers to the project.

This  is a risk that DataONE has already identified.

Risk #105: Member time constraints -- "stretched too thin"
Scheduling of tasks needs to consider the amount of time that can be contributed by various project participants, and care should be taken to avoid the mistake of equating enthusiasm with actual availability.
Mitigations:
In addition to these mitigation strategies, we also ensure information generated by volunteers is captured so the knowledge is not lost if lose the volunteer. This also serves to give recognition to the volunteers.

Volunteers are not necessarily unreliable.  Some of the working groups have chosen to have members for one or two meetings rather than the life of the working group. These members are chosen for their expertise to address specific topics.  The goals of working groups may change with time and charter.  It would, therefore, be appropriate to transition in or out individual members of working groups.

We agree that project management consists of attempting to achieve the goals of the project while managing the classic triple constraints of cost, time and scope. We have four risks on our Risk Register related to this:
The existing mitigation strategies for these include reducing the scope or increasing the time to get the work done.

Network Coordination - Panel Recommendations:
See response below under DataNET Federation

NEED A RESPONSE FOR THIS ONE _ HOW TO GET A MATRIX IN LESS THAN ONE DAY


Strategic and Longer Term Planning - Panel Recommendations:
We  agree.  Logging and recovery are a core part of the DataONE design and  API.  A part of our testing plan for Member Nodes and Coordinating Nodes  includes handling network segmentation, abrupt loss of a node, and disk  failures.  While we have not exposed all of the 

Extensive  operation logging and reporting capabilities are part of the DataONE  service interface design and are intended to support core service  monitoring and overal infrastructure state of health reporting as well  as reporting on content access for individual users, by Member Node, and  through other facets.

Support  for transactions in the federated DataONE infrastructure varies, though  in general operations are designed to be sufficiently granular so that  they fully complete or fail, so that objects or operations are always at  a defined state. Sequential operations such as the synchronization of a  Member Node with a Coordinating Node keep track of state at a level  suffciently granular that should the overal process fail due to network  outage for example, the overall operation will continue from the last  valid step in the process. 

See response below under DataNET Federation

These learning objectives have been defined for four current DataONEeducation activites: The ScienceLinks PhD program; the Data Curation Education in Research Centers (DCERC) awarded in collaboration with the Data Conservancy; Information Sciences courses at the University of Tennessee and the Summer Environmental Information Management Institute run at the University of New Mexico.  

For example, the ScienceLinks PhD program the outcomes are defined as:
Six doctoral graduates who will 
Development of science data and  information curriculum
Development of a model of active science and data librarianship that combines the best of research, teaching, and professional practice and that incorporates from an early stage large national science data initiatives and science organizations.

Goals / Objectives for the Information Sciences courses include:
Outcomes from one session of the three week Graduate Summer Insitute include:
Knowledge:
Skills:
Attitudes:
Current Graduate Courses and Workshops include practical application of search and retrieval processes for the utilization of existing data sets within targeted research exercises.  These queries will be reliant on associated metadata, and stress the importance of well-documented data. Inclusion of NEON’s Education Director, and others whose work involves substantial involvement in education, in the CEE Working Group helps to ensure that education applications are regularly discussed. Structured feedback from workshop sessions will be provided to the CCIT.

Following the public release of DataONE, we are planning additional workshops with educators, scientists and DataONE partners focused on analyzing data for ecological purposes.  In preparation for this, we are exploring the feasibility of engaging a graduate intern to develop data driven exercises. These exercises would be expanded into educational modules flexible enough to suit various educational levels.


Assessment - Panel Recommendations:
 
The Usability and Assessment Working Group, in conjunction with EVA Working Group, will design a test comparing the processes and procedures for data ingest and data access using DataONE and at least one other system.  
 
The test will present a set of realistic, but controlled, tasks to environmental scientist subjects to complete. Their familiarity with the systems and with similar data related tasks will be measured in a pre-task questionnaire. The steps they take and time needed to complete the tasks will be recorded, as will their interactions with the system during the tasks. A post-task questionnaire will measure their perceptions of the ease of use and success of the tasks with the 2 systems. To control for learning, half of the subjects will begin their tasks on DataONE and half will begin on the alternate system. 
 
This test will compare the systems using the following metrics:
1.     Time required to do set tasks
2.     Ease of use (perceived ease measured through post-task questionnaires, plus observed ease measured through monitoring keystrokes and reactions captured during the tasks)
3.     Success at completing the tasks (perceived success measured through post-task questionnaires and system success judged by the researchers)
4.     Perceived value of each system (measured by post-task questionnaires)
 
Timeline: Task design: 2011 and first quarter 2012; Pilot testing: second quarter 2012. Test and analysis: third and fourth quarters 2012.

See response below under DataNET Federation


DataNET Federation
The following three recommendations have been grouped together because they all relate to the DataNet federation.

Network Coordination - Panel Recommendations:
 
Strategic and longer-term planning - Panel Recommendations:
Assessment - Panel Recommendations:
The DataONE Cooperative Agreement calls for
 A.  Kick-Off Meeting. During the first year of the project, NSF will host a DataNet program kick-off meeting to identify mutual interests and  opportunities for the federated development of shared scientific interests, technological challenges, and community engagement goals. The attendance of Key Personnel from each of the projects is required. The  structure of the kick-off meeting and subsequent semi-annual Program Review meetings will enable DataNet awardees to share best practices, explore solutions to shared  problems, develop a shared governance structure across DataNet sites and  facilitate the development of interoperable technical solutions and  standards. Exceptions to required participation must be approved
by the NSF Program Director.

A DataNet Federation Kick-Off meeting would be an ideal venue for 
DataONE would be happy to work with NSF and the Data Conservancy to draft an agenda for a DataNet Federation Kick-Off meeting.