http://epad.dataone.org/ProvWG-leads-09-06-2011

Present: Bertram, Paolo

* ProvWG call:
==> draft job description
- provenance repository, combo of DToL and GoldenTrail
- working with Kepler, Taverna, Vistrails, Galaxy, R? 
- liaison with CCIT

* DOPM report & slides 
D-OPM model editing:  http://wb.mysql.com/
http://en.wikipedia.org/wiki/Entity-relationship_model#ER_diagramming_tools

- Motivation, Use Cases, Goals (cf. DCC abstract):
-- Importance of "wf-land": trace cannot be understood (well) w/o link to workflow (higher level)

rationale for "workflow-land", from the submitted IDCC abstract:
scientists may find it difficult to explore provenance traces unless they can relate them to the original experiment design from which they were derived. Thus, a specification of the experimental process, i.e., the set of workflows involved, must accompany the traces if scientists are to carry out effective curation. 

-- Focus in the sci-wf and provenance communities on single runs ("intra-run provenance")
-- But inter-run provenance crucial
-- Importance of "data-land"
-- DataCite (connection to John Kunze's WG)

--> data-ids solve some stitching problems (at the WF level); probably not within
==> Paolo: eScience Central --> captures "outside provenance" only, thus is a good candidate for taking up data citation standards [=> Paul Watson] 

DataONE view:
http://mule1.dataone.org/ArchitectureDocs-current/design/SearchMetadata.html#creating-citations-from-index-fields
ESIP view:
http://wiki.esipfed.org/index.php/Interagency_Data_Stewardship/Citations/provider_guidelines


-- data structures (nested collections, etc)

-- Use cases:
-- FPC (oldies but goldies, simple use case)
-- pPOD (inter-run provenance, COMAD)
-- eBird (previous EVA use case: citizen science, visualization, use custom code & Vistrails for animations of analysis results)
-- Bob Cook's new use case for EVA 
===> need to find out more
-- Leon Osterweil/Lori Clark (http://laser.cs.umass.edu/people/ljo.html): Harvard Forest ecology use case


- Technical aspects:
-- OPM is just focusing on "trace-land";
-- 

Agenda & Goals of ProvWG Meeting:
- Review foundational use cases (e.g. science pubs with links to provenance), scope, WG charter:
==> are we missing something? 

- Why contribute, why use repository??
-- baseline: he lonely researcher. benefits for individual ("collaboration with myself"). 
-- step 1: virtual collaborations but restricted to homogenous workflow community. benefits for the local community
-- step 2: a common provenance model extends the virtual collaboration to heterogeneous workflow communities. The world gets bigger for everybody
Help establish data quality as a motivator? 

- Is there any good case / example to link system X with system Y? (say Pegasus & Vistrails)
- key is to find domains that are common to different workflow communities
- ex.: eScienceCentral <==> Taverna (chemical engineers on both sides; collaborating this way). Currently using OPM only -- D-OPM would be useful already! 

[eScienceCentral: cloud-based infrastructure, has a wf engine, talks to Taverna via plug-ins
opportunity to show "DOPM in action"
http://www.esciencecentral.co.uk/ ]


- Provenance Toolkit: -> built on top of the provenance repository (GoldenTrail) + stitching capability (DataToL)
- implement common functionalities on a common model.
 
-- what ops make sense to work with provenance? 
-- import into wf-sys, export from wf-sys, load into db, query, visualize, stitch, (selectively?) publish 
-- interop through file formats

- Provenance Exchange Format
-- OPM
-- D-OPM

Provenance as a form of metadata --- what is metadata good for anyway? 

Not just a common model: best practices for mapping own provenance to D-OPM to ensure interop
- see earlier note on Kunze's WG on common identifiers space

- Give us your traces (and models of provenance!?)
 ... Kepler, Taverna, Pegasus, Vistrails, Galaxy, R, ...

- "Benchmark Questions"

- Architecture of ProvRepository

Invitation Draft:
* ProvWG has been working on D-OPM ...


INVITATIONS:

Note: participant list at UCD June 2011:
Action Items:
-  D-OPM graphical [Paolo]
- draft invitation itself (make it compelling :) [Bertram]
- Job description / ad text, talk with D. McG [Bertram]
- draft D-OPM story [Shawn]

- invitation list:
-- Juliana (or proxy)
-- Ewa (Pegasus)
-- Yogesh? (check first interest)
-- Lee (Leon)
- Khalid (Taverna - UK)
- James Taylor (Galaxy)
- Duncan Temple Lang (R)
- Ilkay (BioKepler)



- D-OPM "report"
- ProvWG mailing list