Participants: Paolo, Ilkay, Dan, Manish, Biva, Anand, Shawn, Lei regrets: Saumen Last week's actions: ==> Saumen (coordinator) to work with Anand and Biva to implement the relational schema of the SPS (Manish to consult) Broader goals: 1- translate your CM to a relational model 2- implement the relational schema on some public server (mySQL)? 3- write code logic to map local provenance traces (each of the three "lands") onto the common schema 4- show ability to query "end-to-end" provenance across two traces seamlessly Weekly Reports: * Anand: 1) Coded in java, to extract each piece of data from the input OPM/XML trace, wrap them in a file and upload to the ftp server for a general SDS reference. ==> problems with asserting equivalences; file not in RDF 2) Coordinated by Saumen and in communication with Biva, we created, and gave feedbacks and suggestions on the SPS. 3) Created the dry run mapping from SDF native provenance trace to modified SPS. 4) Trying different methods to assert the local-global data reference equivalence mapping in the shared trace. * Biva Was able to upload data to ftp server Paolo: equivalences local-ids <=> global-ids are stored with SPS put the assertions in the trace itself and map this equivalence assertion from the trace to the SPS (say 'isEquivalent' relation). 1) Coordinated by Saumen and together with Anand created the SPS with three level of information: * work flow * run * collaboration 2) Created a dry use case for the SPS in 1) 3) Created a dry uses case for the revised version of SPS * Saumen - Developed SPS relational schema - Coordinated the schema development effort - Defined the scope for the collaboration interface * Manish summarizes creation of SPS model * Paolo notes that wf-land is missing, collection structure missing, question: how is this different from OPM? Paolo, Bertram arguing to "bring back" wf-land, data-land, etc requirement for DataONE current schema just OPM ==> Previous schema (with wf land): http://groups.google.com/group/datatol/web/sps_20100711.pdf This one is a more complete version: http://datatol.googlegroups.com/web/collaborative_provenance.pdf Paolo: Native Taverna provenance has much more than the OPM trace, e.g. the "wf-land", data-land etc. Bertram suggest to use the names from the techreport: proc = Taverna.processor = Kepler.actor proc_exe = Taverna.activation = Kepler.invocation = OPM.process user = OPM.agent used = DToL.read was_generated_by = DToL.write OPM.was_triggered_by = DToL.i-dep OPM.was_derived_by = DToL.d-dep data = OPM.artifact publish retrieve * Bertram's question: &A, *B --> [PLUS] --> C NEXT ACTIONS: 1. finalize "benchmark" queries (mentors) 2. revise SPS schema (Anand, Biva, Saumen, Manish?, Shawn? Paolo? Bertram? Ilkay?) -- adjust names: (i) use DToL names, (ii) list OPM synonyms in parentheses (see above) -- have layout reflect "wf-land" (almost empty!?), "Trace/OPM-land", "data-land", "collab-land" -- various fixes -- proc vs. proc_exe (remove run_id in proc?) -- retrieve.user_id (?) -- data.run_id (?) -- data requires duplicate copy for each retrieve record (?) -- data supports references to actual data ? -- run_id removed from used, was_triggered_by, was_generated_by, was_derived_from -- give meaningful definitions of attributes (what information they represent) -- add constraints for "fusing" run provenance relations (was_triggered_by, etc.) -- SB: Do we really need to store global and local ids? Can local ids be converted to global ones at upload time? (Note this might solve "fusing" problem as well) ==> Shawn to coordinate call with Anand, Biva, Saumen, Manish, ... 3. implement SPS schema 4. describe native to shared provenance schema mapping Anand: Kepler Provenance Recorder has a spreadsheet Biva: Taverna Saumen: Kepler/COMAD 5. implement native to shared provenance schema mapping --------------- 2010/07/14--------------- Functionality of "collaboration" 1. registration 2. publish 3. retrieve user:- user_id //login id user_name //login name run:- run_id PK (sys gen) local_run_id Varchar(64) //native from trace creater_id //who created the trace publisher_id //who published the trace run_date //when the wf was run.. it is expected that publisher would //provide that - optional publish_date //sys gen trace_file //optional -- URL or BLOB workflow_name //optional workflow_system //optional workflow_file //optional -- URL or BLOB invocation:- invocation_id PK //sys gen run_id //ref to run local_invocation_id var //optional invocation_occurence INT //?? actor_type //type, class actor_name //name, actor_id data_artifact:- data_artifact_id PK //run_id + ":" + local_data_artifact_id local_data_artifact_id //?? run_id name //label, name -- optional type //s, path/url -- optional value //respective value based on type global_url //optional -- captures the "publish" invocation_input:- invocation_id PK data_artifact_id PK data_role time_no_later_than time_no_earlier_than time_clock_id invocation_output:- invocation_id FK data_artifact_id PK data_role time_no_later_than time_no_earlier_than time_clock_id invocation_dep:- from_invocation_id to_invocation_id data_artifact_dep:- from_data_artifact_id to_data_artifact_id retrieve:- ????? What to do next: (1) modify the schema as per the discussion (except retrieve table) -- Saumen (both ERD and DDL) (2) create the schema "DToL" on your local pc/laptop (mysql) (3) write your own scripts to load your traces to the agreed schema (4) discussion about "retrieve"; when we nail this down, then include this to implementation. LocalArtifactCopy ( localArtifactId localPublishingRunId copiedArtifactId FKEY ) //COMMENTS data_id = "10" data_id ="r1:20" /home/manish/proj1/data1 /home/manish/proj1/data1 data_artifact_id = local_data_artifact_id + run_id //dublin core -- to get std meanings for things: http://dublincore.org/documents/dces/ ...... /Users/dey/DToL/comad2/comad-exp/workflows/demo/DTol/addition/input1.txt http://taverna.opm.org/t2:ref//3123868a-40ad-42c8-9f3c-7fb3112651e7?db5d8f4c-b49a-4ab0-b108-c0d3bb703620 artifact id="_a2"> reference.hdr (i) data_artifact_id = "1" read (to get the data_artifact_id), write local_id = 10 (ii) local_data_artifact_id run_id write