Attendees: Rebecca, Amber, Bill, Bruce, Suzie,Bertram, Matt, John K, Dave, John Cobb, Steve

Regrets:  Hilmar, Viv

DataONE LT Call:  9am AK/10am PT/11am MT/noon CT/1pm ET

1.  Please join my meeting, Apr 13, 2012 at 10:55 AM MDT.
https://www1.gotomeeting.com/join/862105097

2.  Use your microphone and speakers (VoIP) - a headset is recommended. Or, call in using your telephone.

Dial  1 (213) 289-0016
Access Code: 862-105-097
Audio PIN: Shown after joining the meeting

Meeting ID: 862-105-097

GoToMeeting®
Online Meetings Made Easy™

We will also use the epad: http://epad.dataone.org/2012Apr13-LT-VTC if participants can get to it
 
If you have items to add, let me know
 
Agenda for 2012-04-13

1) CI Update (Dave)
Reminder that need feedback on CI by Monday, April 16 using RedMine

https://cn-stage.dataone.org

USGS/DataONE meeting next week
What are the resources needed to bring a MN online - the initial MNs take more work to bring online but ideally will require less developer time. This includes the s/w that needs to be developed as well as the configuration issues. Still hoping to have the Mercury MN working at public release time.

Dave's report to NSF from April 12, 2012
DataONE CI Status Update - 20120411
===================================

Overview

A major goal of the DataONE cyberinfrastructure is to support the long-term reuse of science data available through new and existing repositories by providing a platform that fullfils the basic requirements of: 1) persistent, unique identifiers for all content; 2) high level of confidence for long-term content availability; 3) mechanisms for discovery of relevant content; and 4) infrastructure supporting multiple user identities and consistent content access control.

These core capabilities are implemented in a federated system comprised of three major components: 1) Member Nodes which are new or existing data repositories (e.g. Knowledge Network for Biocomplexity; ORNL Distributed Active Archive Center; Dryad) that support the DataONE Member Node service interfaces; 2) Coordinating Nodes which provide
centralized services such as tracking content, managing content replication, user identity management, and metadata indexing for content discovery; and 3) The Investigator Toolkit which consists of software libraries, applications, and application extensions that enable seamless interaction with the DataONE platform.

The development of the DataONE infrastructure requires integration testing of all components prior to their release. This is especially important for this system which, once available for general use, must operate with high reliability and content consistency so that the system will be trusted as a core piece of infrastructure for data management best practices. The integration testing process uses separate complete installations of all components of the DataONE infrastructure, with each complete installation being an "Environment". We are working with four separate environments in developing DataONE infrastructure: 1) Development which is the potentially unstable, latest stream of activity
used for testing bug fixes and generally initial testing of new features or changes; 2) the Sandbox provides an environment where fixes and features can be evaluated in a more stable situation than Development environment; 3) The Staging environment where release candidates are evaluated by end users; and 4) the Production environment which is the
operational, stable version of the DataONE infrastructure.

As of the week of April 9, the development process has asymptoted to the
point where most development activities are focussed on making
non-critical fixes and updates. All core services are operational in the
staging environment (accessible at: https://cn-stage.dataone.org )
currently being reviewd by the DataONE Leadership Team.

We anticipate deployment of the production environment during the week
of May 14 or sooner depending on feedback from reviewers. The initial
production environment will contain thee Coordinating Nodes located at
Oak Ridge TN, University of California Santa Barbara, and the University
of New Mexico. Member nodes will include: The Knowledge Network for
Biocomplexity, which in turn will also trigger installation of the LTER,
PISCO, and SANParks Metacat based MNs; The Merrit system of UC
California Curation Center; The ORNL Distributed Active Archive Center;
and The USGS Clearing House. Member Nodes expected to participate soon
afterwards will include the Avian Knowledge Network; more USGS data
centers; and Dryad. The Investigator Toolkit will inlcude several
low-levels tools initially focussed on supporting developer interested
in development of plugins or application extensions and will include
base libraries in Python and Java, a plugin for R, and a command line
client that enables access to all the DataONE services.

Platform Services
-----------------

The DataONE platform relies on a number of core services that together
implement the cyberinfrastructure functionality addressing the
requirements of the project. These services and their current status are
summarized as follows.

DataONE APIs
~~~~~~~~~~~~

The APIs define the service endpoints for communication between nodes
and clients participating in the cyberinfrastructure.

Status: Stable, complete.

Synchronization
~~~~~~~~~~~~~~~

Synchronization is the process where Coordinating Nodes recognize new or
changed content on Member Nodes, and process that content so that the
Coordinating Nodes obtain a copy of the System Metadata, and a copy of
the Science Metadata or Resource Map if appropriate. Synchronization of
content triggers other operations such as replication and indexing.

Status: Stable, fully operational, but some edge-case consistency issues
that have proven difficult to track down and resolve. Now able to
reproduce the inconsistent state, so a resolution is expected soon.

Replication
~~~~~~~~~~~

Replication is the process where Coordinating Nodes request Member Nodes
to retrieve a copy of an object so that the number of copies in the
DataONE system conform to the requested number of replicas.

Status: Stable, appears to be fully operational.

Authentication
~~~~~~~~~~~~~~

The Authentication Service utilizes the InCommon system to generate
client side certificates augmented with additional identity and group
information for DataONE users that enables secure access to DataONE
services such as content creation and alteration of properties such as
access control rules.

Status: Stable, operational, though usability and possibly some
functional changes required.

Access Control
~~~~~~~~~~~~~~

The Access Control services define who is able to read or modify content
that is added to the DataONE system. Access control is core to all
services and appears to completely operational, though there are many
combinations of rules and update sequences that continue to be fully
tested.

Status: Stable, operational. Full integration testing is ongoing.
Operational, but still requiring further testing in field conditions is
the timely propogation of a change in access control rules (e.g. making
an object publicly readable or vice-versa) to all replicas of an object.

Log Aggregation
~~~~~~~~~~~~~~~

The Log Aggregation service runs on the Coordinating Nodes and retrieves
access log records from Member Nodes, combining them in a manner that
enables generation of summary reports for content contributors, node
operators, and other stake holders.

Status: Implemented, but requires further testing in an operational
environment. Especially important is ensuring appropriate evaluation of
access control rules for log records.

Indexing
~~~~~~~~

The Indexing Service generates an index that is used to assist with
content discovery. The service parses the various metadata formats added
to DataONE, providing a common view across the different formats. The
Indexing Service is necessary for operation of the ONEMercury search
user interface.

Status: Stable. Existing metadata format extraction rules will require
some fine tuning, additional metadata formats to be added over time.

Search
~~~~~~

The Search Service runs on the Coordinating Nodes and provides a
programmatic mechanism for client tools to search the DataONE holdings.
It uses the same index as the ONEMercury user interface.

Status: Stable. Currently provides access to only publicly readable
content. Full support for authenticated access to restricted content is
prototyped, but requires further testing and evaluation before
integration to the Coordinating Nodes.

CN Mirroring
~~~~~~~~~~~~

Coordinating Node mirroring is a service (actually several services)
ensuring all system state (system metadata, science metadata, node
statuses, user identities, etc) is mirrored, and consistent between the
CNs. There are four mechanisms being employed for implementing this
service, each chosen for the type of content being mirrored and its
latency tolerance.

Status: Unstable (somewhat). There are some edge case conditions that
can cause inconsistency between the CNs. In most, the issue appears to
be related to a conflict between access permission rules on legacy
content being ingested by DataONE. A fix for this problem is currently
undergoing testing in the sandbox environment.

Software Implementations
------------------------

Metacat MN
~~~~~~~~~~

Fully operational. Probably ready for cutting a final release.

GMN MN
~~~~~~

Fully operational though some minor issues remain that do not affect
operation within the DataONE environment.

Mercury MN
~~~~~~~~~~

Operational, requires further testing.

Dryad MN
~~~~~~~~

Operational, requires further testig.

CLI
~~~

The command line client provides a tool for low level interaction with
the DataONE services. The CLI is operational, with additional
functionality and features being added on a regualr basis.

R Client
~~~~~~~~

Updated to work with the current API and service versions. Operational
as something between proof of concept and a usable tool, will require
further work before general deployment is feasible.

ONEMercury
~~~~~~~~~~

Operational. Some UI tweaks required, especially for rendering metadata
documents. Access to only public content is fully implemented. Providing
access to restricted content requires some significant changes to
ONEMercury operation and will not be a feature for the public release.

Hardware
--------

All hardware located at the three centers is fully operational with no known issues.

General Schedule
----------------

Current
~~~~~~~
We are currently evaluating the service and user interfaces in the Staging environment that is running our latest release candidate. In parallel, we are working to resolve the edge-case inconsistency issue between Coordinating Nodes. Issues identified will be collated and
resolved as necessary during the week of 16 April.

Week of April 16
~~~~~~~~~~~~~~~~
-  UI and other changes incorporated
-  New release candidate tagged and built, incorporating fixes from sandbox, development and staging environments as necessary
-  (Also running a USGS Member Node and client tool workshop in Denver)

Week of 23 April
~~~~~~~~~~~~~~~~
-  Update stage environment with new release candidate and advertise to project participants
-  Prepare production servers
-  Ongoing revisions

Week of 30 April
~~~~~~~~~~~~~~~~
-  Prioritize, schedule, and incorporate issues identified by users
-  Documentation cleanup

Week of 07 May
~~~~~~~~~~~~~~
-  Prepare release
-  Tentative initial public release

Week of 14 May
~~~~~~~~~~~~~~
-  Collate responses from user feedback
-  Ongoing development of client tools

June
~~~~
-  Prepare version 1.0.1 update incorporating user requested changes where possible
-  Metadata index supporting discovery process will be progressively improved (e.g.  controlled vocabularies and keywords, support for more metadata formats)
   
   
   
2) Feedback on infrastructure (all)
Bruce, Bill, John K, Suzie, Amber and Rebecca have checked out the URL to varying degrees.
Reminder to all to take a look this weekend.

3) NSF meeting (Bill)
Rebecca sent out the minutes from the NSF/DataONE meeting yesterday.(Error in notes - S&G is not a shared CI,CE WG) 
Tenative plan to hold bi-weekly calls with DataNet teams and monthly videoconferences with combined DataNets. July 13 & 14th meeting of all DataNets has been requested but conflict with travel day for DataONE DUG meeting in Madison, WI.
The new DataNet reverse site visits showed them to be on track.  The three other DataNet teams complained about the reporting structure as being burdonsome (monthly). Looking at what will be considered to be success at 18-month review for the new DataNets.
NSF still wanted an RCN but DataNets complaining about time required for this effort.

4) Around the room

John Cobb: Nothing for me

Steve Kelling: Nothing for me

Suzie: Last week the  SCWG held a phone conference to review current activities and to prepare for the joint working group meeting in May. 

The SEC has announced their inaugural SEC Faculty Achievement Awards and Carol was chosen as the honoree at UT. The twelve professors representing the SEC schools are now being considered for SEC Professor of the Year.  Carol continues the tradition of Lady Vol championships in all areas!

Amber: Short Course - deadline for applications is today. Thus far we have had 5 with an extra one form an assistant professor. Therefore we will extend the application deadline until next Weds (18th). Please promote - information can be found at: http://bit.ly/HOWP65. Joseph JaJa as SESYNC has indicated that they would like to host future workshops and could cover (partial?) costs.

Bruce: Still looking for a replacement for Nick (will be transitioning out June 1st)

Matt: working on demo for next week's meeting in CO
Overwhelming response to sensor network workshop in Albuquerque - closed registration early because of the response.  Looking for funding to hold the workshop again next year.

John K: have draft charter for Preservation & Metadata WG ready for review.  With Dave and Matt's input, Carly and I drafted a basic approach to DCXL (Excel add-in) metadata, based on EML.  This meets minimum requirements for EML and DataCite DOIs.

Rebecca:  Quarterly reports due at the end of the month - Please send reports, presentations, and references for papers by the last week of April