#persist

.. meta::
   :keywords: 2013, 201302, CCIT, Agenda, Developers, Santa Barbara

Agenda for CCIT / Developers Meeting, 20130205-07
=================================================

:Document Status: DRAFT

:Location: 

  NCEAS_, 735 State Street, Santa Barbara
  Third floor conference room
  hangout: https://plus.google.com/hangouts/_/fa23c92c907b02456a155dfd8b2ebaeaf8649a7a 

:Date: 04 - 08 February, 2013


Objectives
----------

Major goals for ongoing development work in DataONE include:

- Review progress and current state of infrastructure

- Preparations for project review scheduled for Feb 27 - Mar 1, including 
  demonstrations and presentations

- Support for new and emerging capabilities
  - Mutability
  - Provenance
  - Semantics in search

- Plans for infrastructure development through remainder of project



Schedule
--------

Day 1 - Tuesday 05 February
...........................

Emphasis on current status, overview of plans for the RSV and renewal proposal,
reality check on the story board for demos. High level outline of future
development plans.

:08\:30: *Block 1.* Introductions, agenda review, progress report

 - Review the current state of the DataONE infrastructure, including detail on
   the Coordinating and Member Node implementations, and the various 
   Investigator Toolkit pieces. High level outline of future development plans.

:10\:00: `Break`

:10\:30: *Block 2.*  

- RSV outline and initial concepts for renewal proposal.
- Initial draft storyboard for demonstrations
- Possible questions from panel and our responses
- Notes from previous CCIT meeting:

  Things that we will be measured on:
  
  - Depth of deployment
    - number of MNs
    - amount of data, type of data

  - Ability to access data
    - Easy to do a search that retrieves data

  - The WOW factor of the UIs and infrastructure
    - Is this fundamentally new?
    - Is this really helpful to users?

  - Promote the functionality of the overall system

  - How do we demonstrate the backup functionality?
    - Perhaps the dashboard can be used to show object distribution across multiple nodes?

  - Preservation attributes of the design
    - illustrate how institutional diversity etc provides a level of preservation

  - User oriented demonstrations / use cases
    - e.g. basic case of researcher adding content to the system, later retrieving it and demonstrating how the system supports those operations

  - Can we demonstrate the data life cycle? e.g. follow what's done in the video.

    - R
    - Morpho
    - DataUp
    - ONEDrive

    - Data citation tools
    - possible resources for matlab development
    - possible resources for kepler development

  - Site review will want live demos.
  - Site review will be expecting DataNet interoperabilities
  - Retrieving secure / private content from member nodes


:12\:00: `Lunch`

:13\:00: *Block 3.* Reviewing products, their release and demo-worthiness

Products include:

- specifications, documentation
- infrastructure and services
- member node implementations
- client libraries
- investigator tools
  - ONEMercury
  - ONE-R
  - Morpho 
  - v2.0.0-RC installers available here: http://bespin.nceas.ucsb.edu/dataone/downloads/)
  - DataUp
  - ONEDrive
  - CLI
  - Kepler plugin
  - VisTrails module(s)
  - Provenance support

:15\:00: `Break`


:15\:30: *Block 4.*  Product review continued.


:17\:30: `Close`



Day 2 - Wednesday 06 February
.............................

Emphasis on presentations and demonstrations for the RSV. Goal is to reach draft
versions of RSV presentations and document tasks remaining to prepare for the
RSV. Morning emphasis on products at or very close to release worthiness.
Afternoon to work through products in prototype stage that may be used to
demonstrate future capabilities.

:08\:30: *Block 5.* Storyboarding production ready components


:10\:00: `Break`

:10\:30: *Block 6.*  


:12\:00: `Lunch`

:13\:00: *Block 7.*  Storyboarding prototypes, emerging functionality


:15\:00: `Break`

:15\:30: *Block 8.* 


:17\:30: `Close`



Day 3 - Thursday 07 February
............................

Review and discussion of future development activities and prioritization.

- Discussion on supporting mutable content
- Discussion on making AuthMN authoritative for SystemMetadata
   -- found to be critical for tools like Morpho
- Member node deployment process and scheduling
- Integration of working group outcomes:
  - provenance
  - semantics
  - preservation
- Discussion about the whole release and infrastructure update process
- Sponsor required metrics, review, revise, and ensure we are able to report

:08\:30: *Block 9.* 

- Telecon with semantics group starting 09:00


:10\:00: `Break`

:10\:30: *Block 10.* 


:12\:00: `Lunch`

:13\:00: *Block 11.*  


:15\:00: `Break`

:15\:30: *Block 12.*  Wrap up, scheduling, task assignment

:17\:30: `Close`



.. _NCEAS: http://www.nceas.ucsb.edu/contact


Meeting Notes
=============

Day 1, Block 1
--------------
Day 1, Block 2
--------------
Day 1, Block 3
---------------
Day 2, Block 1
--------------
NOTE: We are missing taxonomy from the hierarchy (an oversight)

Day 2, Block 2
--------------
R installation errors
---------------------
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
*** arch - i386
Error in loadNamespace(package, c(which.lib.loc, lib.loc)) : 
  in package ‘dataone’ classes Foo were specified for export but not defined
Error: loading failed
Execution halted
*** arch - x86_64
Error in loadNamespace(package, c(which.lib.loc, lib.loc)) : 
  in package ‘dataone’ classes Foo were specified for export but not defined
Error: loading failed
Execution halted
ERROR: loading failed for ‘i386’, ‘x86_64’
* removing ‘/Library/Frameworks/R.framework/Versions/2.15/Resources/library/dataone’
* restoring previous ‘/Library/Frameworks/R.framework/Versions/2.15/Resources/library/dataone’


Day 2, Block 3
--------------

Notes on Dryad deployment
ORE1
|
|--DryadMetadata1 (aka DryadDataPackage)
|    |
|    |
|----+---DryadDataFile1 (aka DryadDataFile)
|    |     |
|----|-----|----Data1 (aka bitstream)
|    |
|    |
|--------DryadDataFile2
|    |     |
|----|-----|----Data2

* each of the 6 objects have an associated system metadata document

Relationships:

ORE1 aggregates DryadMetadata1
ORE1 aggregates DryadDataFile1
ORE1 aggregates DryadDataFile2
ORE1 aggregates Data1
ORE1 aggregates Data2

DryadMetadata1 documents DryadDataFile1
DryadMetadata1 documents DryadDataFile2

DryadDataFile1 documents Data1
DryadDataFile2 documents Data2
(and the reverse of all the above)



Day 3, Block 1
--------------

Some discussion on Dryad identifiers and mutability

Semantics presentation
- SementEco hierarchical faceted search
- Freebase as a kind of authority for terms - can perhaps generalize to a number of differnt sources for commonly used / adopted terms. This assists with the early process of term normalization / classification
- Analyze as the content is added? Perhaps use manual annotation of content as it comes in to member nodes; can also try statistical evaluation of content

Day 3, Blocks 1 & 2
-------------------
These were demos from the semantics WG on topic modeling and semantic search by Patrice Seyed, Stacy Rebich Hespanha, and Ben Adams

Day 3, Block 3
--------------
Discussion on authoritiative control of SystemMetadata
Metrics
Network Status
--------------
Day 3, Block 4
--------------

Discussion on support of mutability of objects within DataONE

Options:

- Default replication policy = 2 (with practical size limit)
- Resolve should always point to the recent revision of something if the requested version is not available. In this case it should be clear that the request verson is not available.
- Alternate identifiers generally seem to be a useful concept, but:
  - Which APIs support this AID ?
  - Does a PID and an AID share the same namespace?
  
Use "Series Identifier" instead of "Alternate Identifier"

SID is specified at some point in the life of an obsolesence chain, and is used to refer to that series of the object.

A SID can not be applied to any object outside of the obsolesence chain for an object, i.e., it is unique for an object family.





Future Development Topics

1.2:
- Usability feedback
- Provenance tracing
- Alternate identifiers (series ID or concept ID) [*]
- MN Service advertisement
- Log report generation[*]
  - Log statistics API
  - UI design
  - Member Node statistics page
- Identity management [**]
- Member node software stacks [***]
  - iRods [***]
  - DSpace
  - Fedora
  - OPeNDAP (could we do something that was DAP generic?)
  - "Slender Node" (sitemap-based harvesting method) (John K) [***]
      - OGC CSW and OAI-PMH.  Might need to do something to follow links to data to validate that it's really there.
  - GEOPortal Server (NODC, USGS)
- Member node usability work [*
  - Make it easier for MN's to work with DataONE software
- ITK developments?
  - Kepler 
  - R client
  - VisTrails
  - Matlab
  - Robust ONEDrive [****]
  - DataUP (multi-platform)
  - ONEMercury/Search (including usability improvements/replacement) [*******]
    - Includes improvements to biblio tools
- “Semantic” search

Auditing services for content consistency
  - How many 404's are we getting?
  - Are checksums valid? 
  - ...

1.3:
Data slicing, subsetting
Content annotation
CN refactoring? (better scale out within a CN site)
WG input