Participants:
Paolo, Ilkay, Dan, Manish, Biva, Anand, Shawn, Lei
regrets: Saumen
Last week's actions:
==> Saumen (coordinator) to work with Anand and Biva to implement the relational schema of the SPS
(Manish to consult)
Broader goals:
1- translate your CM to a relational model
2- implement the relational schema on some public server (mySQL)?
3- write code logic to map local provenance traces (each of the three "lands") onto the common schema
4- show ability to query "end-to-end" provenance across two traces seamlessly
Weekly Reports:
* Anand:
1) Coded in java, to extract each piece of data from the input OPM/XML trace, wrap them in a file and upload to the ftp server for a general SDS reference.
==> problems with asserting equivalences; file not in RDF
2) Coordinated by Saumen and in communication with Biva, we created, and gave feedbacks and suggestions on the SPS.
3) Created the dry run mapping from SDF native provenance trace to modified SPS.
4) Trying different methods to assert the local-global data reference equivalence mapping in the shared trace.
* Biva
Was able to upload data to ftp server
Paolo: equivalences local-ids <=> global-ids are stored with SPS
put the assertions in the trace itself and map this equivalence assertion from the trace to the SPS (say 'isEquivalent' relation).
1) Coordinated by Saumen and together with Anand created the SPS with three level of information:
* work flow
* run
* collaboration
2) Created a dry use case for the SPS in 1)
3) Created a dry uses case for the revised version of SPS
* Saumen
- Developed SPS relational schema
- Coordinated the schema development effort
- Defined the scope for the collaboration interface
* Manish summarizes creation of SPS model
* Paolo notes that wf-land is missing, collection structure missing, question: how is this different from OPM?
Paolo, Bertram arguing to "bring back" wf-land, data-land, etc
requirement for DataONE
current schema just OPM
==> Previous schema (with wf land):
http://groups.google.com/group/datatol/web/sps_20100711.pdf
This one is a more complete version:
http://datatol.googlegroups.com/web/collaborative_provenance.pdf
Paolo:
Native Taverna provenance has much more than the OPM trace,
e.g. the "wf-land", data-land etc.
Bertram suggest to use the names from the techreport:
proc = Taverna.processor = Kepler.actor
proc_exe = Taverna.activation = Kepler.invocation = OPM.process
user = OPM.agent
used = DToL.read
was_generated_by = DToL.write
OPM.was_triggered_by = DToL.i-dep
OPM.was_derived_by = DToL.d-dep
data = OPM.artifact
publish
retrieve
* Bertram's question:
&A, *B --> [PLUS] --> C
NEXT ACTIONS:
1. finalize "benchmark" queries (mentors)
2. revise SPS schema (Anand, Biva, Saumen, Manish?, Shawn? Paolo? Bertram? Ilkay?)
-- adjust names: (i) use DToL names, (ii) list OPM synonyms in parentheses (see above)
-- have layout reflect "wf-land" (almost empty!?), "Trace/OPM-land", "data-land", "collab-land"
-- various fixes
-- proc vs. proc_exe (remove run_id in proc?)
-- retrieve.user_id (?)
-- data.run_id (?)
-- data requires duplicate copy for each retrieve record (?)
-- data supports references to actual data ?
-- run_id removed from used, was_triggered_by, was_generated_by, was_derived_from
-- give meaningful definitions of attributes (what information they represent)
-- add constraints for "fusing" run provenance relations (was_triggered_by, etc.)
-- SB: Do we really need to store global and local ids? Can local ids be converted to global ones at upload time? (Note this might solve "fusing" problem as well)
==> Shawn to coordinate call with Anand, Biva, Saumen, Manish, ...
3. implement SPS schema
4. describe native to shared provenance schema mapping
Anand: Kepler Provenance Recorder has a spreadsheet
Biva: Taverna
Saumen: Kepler/COMAD
5. implement native to shared provenance schema mapping
--------------- 2010/07/14---------------
Functionality of "collaboration"
1. registration
2. publish
3. retrieve
user:-
user_id //login id
user_name //login name
run:-
run_id PK (sys gen)
local_run_id Varchar(64) //native from trace
creater_id //who created the trace
publisher_id //who published the trace
run_date //when the wf was run.. it is expected that publisher would
//provide that - optional
publish_date //sys gen
trace_file //optional -- URL or BLOB
workflow_name //optional
workflow_system //optional
workflow_file //optional -- URL or BLOB
invocation:-
invocation_id PK //sys gen
run_id //ref to run
local_invocation_id var //optional
invocation_occurence INT //??
actor_type //type, class
actor_name //name, actor_id
data_artifact:-
data_artifact_id PK //run_id + ":" + local_data_artifact_id
local_data_artifact_id //??
run_id
name //label, name -- optional
type //s, path/url -- optional
value //respective value based on type
global_url //optional -- captures the "publish"
invocation_input:-
invocation_id PK
data_artifact_id PK
data_role
time_no_later_than
time_no_earlier_than
time_clock_id
invocation_output:-
invocation_id FK
data_artifact_id PK
data_role
time_no_later_than
time_no_earlier_than
time_clock_id
invocation_dep:-
from_invocation_id
to_invocation_id
data_artifact_dep:-
from_data_artifact_id
to_data_artifact_id
retrieve:-
?????
What to do next:
(1) modify the schema as per the discussion (except retrieve table) -- Saumen (both ERD and DDL)
(2) create the schema "DToL" on your local pc/laptop (mysql)
(3) write your own scripts to load your traces to the agreed schema
(4) discussion about "retrieve"; when we nail this down, then include this to implementation.
LocalArtifactCopy (
localArtifactId
localPublishingRunId
copiedArtifactId FKEY
)
//COMMENTS
data_id = "10"
data_id ="r1:20"
/home/manish/proj1/data1
/home/manish/proj1/data1
data_artifact_id = local_data_artifact_id + run_id
//dublin core -- to get std meanings for things:
http://dublincore.org/documents/dces/
......
/Users/dey/DToL/comad2/comad-exp/workflows/demo/DTol/addition/input1.txt
http://taverna.opm.org/t2:ref//3123868a-40ad-42c8-9f3c-7fb3112651e7?db5d8f4c-b49a-4ab0-b108-c0d3bb703620
artifact id="_a2">
reference.hdr
(i)
data_artifact_id = "1"
read (to get the data_artifact_id), write
local_id = 10
(ii)
local_data_artifact_id
run_id
write