http://epad.dataone.org/ProvWG-leads-09-06-2011 Present: Bertram, Paolo * ProvWG call: ==> draft job description - provenance repository, combo of DToL and GoldenTrail - working with Kepler, Taverna, Vistrails, Galaxy, R? - liaison with CCIT * DOPM report & slides D-OPM model editing: http://wb.mysql.com/ http://en.wikipedia.org/wiki/Entity-relationship_model#ER_diagramming_tools - Motivation, Use Cases, Goals (cf. DCC abstract): -- Importance of "wf-land": trace cannot be understood (well) w/o link to workflow (higher level) rationale for "workflow-land", from the submitted IDCC abstract: scientists may find it difficult to explore provenance traces unless they can relate them to the original experiment design from which they were derived. Thus, a specification of the experimental process, i.e., the set of workflows involved, must accompany the traces if scientists are to carry out effective curation. -- Focus in the sci-wf and provenance communities on single runs ("intra-run provenance") -- But inter-run provenance crucial -- Importance of "data-land" -- DataCite (connection to John Kunze's WG) --> data-ids solve some stitching problems (at the WF level); probably not within ==> Paolo: eScience Central --> captures "outside provenance" only, thus is a good candidate for taking up data citation standards [=> Paul Watson] DataONE view: http://mule1.dataone.org/ArchitectureDocs-current/design/SearchMetadata.html#creating-citations-from-index-fields ESIP view: http://wiki.esipfed.org/index.php/Interagency_Data_Stewardship/Citations/provider_guidelines -- data structures (nested collections, etc) -- Use cases: -- FPC (oldies but goldies, simple use case) -- pPOD (inter-run provenance, COMAD) -- eBird (previous EVA use case: citizen science, visualization, use custom code & Vistrails for animations of analysis results) -- Bob Cook's new use case for EVA ===> need to find out more -- Leon Osterweil/Lori Clark (http://laser.cs.umass.edu/people/ljo.html): Harvard Forest ecology use case - Technical aspects: -- OPM is just focusing on "trace-land"; -- Agenda & Goals of ProvWG Meeting: - Review foundational use cases (e.g. science pubs with links to provenance), scope, WG charter: ==> are we missing something? - Why contribute, why use repository?? -- baseline: he lonely researcher. benefits for individual ("collaboration with myself"). -- step 1: virtual collaborations but restricted to homogenous workflow community. benefits for the local community -- step 2: a common provenance model extends the virtual collaboration to heterogeneous workflow communities. The world gets bigger for everybody Help establish data quality as a motivator? - Is there any good case / example to link system X with system Y? (say Pegasus & Vistrails) - key is to find domains that are common to different workflow communities - ex.: eScienceCentral <==> Taverna (chemical engineers on both sides; collaborating this way). Currently using OPM only -- D-OPM would be useful already! [eScienceCentral: cloud-based infrastructure, has a wf engine, talks to Taverna via plug-ins opportunity to show "DOPM in action" http://www.esciencecentral.co.uk/ ] - Provenance Toolkit: -> built on top of the provenance repository (GoldenTrail) + stitching capability (DataToL) - implement common functionalities on a common model. -- what ops make sense to work with provenance? -- import into wf-sys, export from wf-sys, load into db, query, visualize, stitch, (selectively?) publish -- interop through file formats - Provenance Exchange Format -- OPM -- D-OPM Provenance as a form of metadata --- what is metadata good for anyway? Not just a common model: best practices for mapping own provenance to D-OPM to ensure interop - see earlier note on Kunze's WG on common identifiers space - Give us your traces (and models of provenance!?) ... Kepler, Taverna, Pegasus, Vistrails, Galaxy, R, ... - "Benchmark Questions" - Architecture of ProvRepository Invitation Draft: * ProvWG has been working on D-OPM ... INVITATIONS: Note: participant list at UCD June 2011: * Bertram Ludaescher (UCD) -- L * Paolo Missier (Newcastle) -- L * Shawn Bowers (Gonzaga) -- M * Yogesh?? Simmhan (USC), 2nd day -- M * Saumen Dey (UCD) -- M, S * Michael Agun (Gonzaga) -- S (Shawn?) * Ewa Deelman (USC), 1st day -- G, M (to be confirmed) * G Leon Osterweil (UMass) -- G, N (Harvard ecology/provenance use case!?) * G Lori Clark (UMass) -- G, N? * Anand Sarkar (UCD) -- G * Jim Myers (RPI) -- G, N? * Michael Wang (UCD) -- G, V * Sven Koehler (UCD) -- G * Tim McPhillips (2nd day) -- G Action Items: - D-OPM graphical [Paolo] - draft invitation itself (make it compelling :) [Bertram] - Job description / ad text, talk with D. McG [Bertram] - draft D-OPM story [Shawn] - invitation list: -- Juliana (or proxy) -- Ewa (Pegasus) -- Yogesh? (check first interest) -- Lee (Leon) - Khalid (Taverna - UK) - James Taylor (Galaxy) - Duncan Temple Lang (R) - Ilkay (BioKepler) - D-OPM "report" - ProvWG mailing list