/2010-07-20-datatol

Participants:

Anand, Biva, Lei, Ilkay, Manish, Dan
regrets: Paolo, Saumen, Shawn

Last week's actions:
1. finalize "benchmark" queries (mentors)
https://docs.google.com/document/edit?id=1IopffnHOwPk6v-Pyp_BGQeR3wcSknGFL34EUKpq05Xk&hl=en&authkey=CNnsqdIF

2. revise SPS schema (Anand, Biva, Saumen, Manish?, Shawn? Paolo? Bertram? Ilkay?)
-- adjust names: (i) use DToL names, (ii) list OPM synonyms in parentheses (see above)
-- have layout reflect "wf-land"   (almost empty!?), "Trace/OPM-land", "data-land", "collab-land"
-- various fixes
      -- proc vs. proc_exe (remove   run_id in proc?)
      -- retrieve.user_id (?)
       -- data.run_id (?)
      -- data requires duplicate copy for each retrieve record (?)
      --   data supports references to actual data ?
      -- run_id removed from used, was_triggered_by, was_generated_by, was_derived_from
      -- give meaningful definitions   of attributes (what information they represent)
       -- add constraints for "fusing" run provenance relations (was_triggered_by, etc.)
      -- SB: Do we really need to   store global and local ids? Can local ids be converted to global ones   at upload time? (Note this might solve "fusing" problem as well)
--> Shawn to coordinate call with Anand, Biva, Saumen, Manish, ...
3. implement SPS schema
4. describe native to shared provenance schema mapping
Anand: Kepler Provenance Recorder has a spreadsheet
Biva: Taverna
Saumen: Kepler/COMAD
5. implement native to shared provenance schema mapping

Weekly Reports:

* Anand: http://groups.google.com/group/datatol/web/weekly-report-5?_done=%2Fgroup%2Fdatatol%3F
1) Created the SPS schema in my local environment.
2) Finished the coding for utlity for uploading data to ftp server and asserting the equivalence.
==> use similar names for Biva's, Anand's, and Saumen's tools; btw: 'utility' is too generic
3) Working on capturing the rest of the dependency relations as suggested by Manish in native OPM trace.
4) Analyzing and working on mapping between the native trace and SPS.(Analyzing whether querying the native API will capture extra useful information than that of the OPM format which can be dumped into the SPS. )

* Biva: http://groups.google.com/group/datatol/web/the-taverna-api-client?hl=en
adding codes and snippets
created SPS local system

schema is basically the same as Saumen's

* Saumen: http://groups.google.com/group/datatol/web/saumen-20100720?_done=%2Fgroup%2Fdatatol%3F&hl=en

Discussions:

Anand: Kepler OPM trace -> SPS - staging area approach? ==> take offline

Biva: Taverna End-to-end:
load wf1 from myExperiment
bind data
run wf1
[[fyi: only extracts values in its own Taverna store]]
need to have DB connected to Taverna (via jar)
include jar in a certain installation folder, this
[[ Ilkay: related to Data Playground? Not known ]]
traces are in the local MySQL DB (using internal Taverna-ids)
querying the traces via API
populate local SPS variant
publish from local SPS to shared SPS

Biva's question: native prov store has read/write provenance, not d-depends, i-depends !?
Manish: doesn't Paolo have some logic to infer i-dep, d-dep from read/write prov?
==> create EXAMPLE, document!

global-ref: gid
local-ref: lid
taverna-ref: tid

gid ~ lid : we know how to keep it
lid ~ tid: we don't know

Suggestion:
- do it by hand
- use hashes:
--- tid ~ hash(*tid)
--- lid ~ hash(*lid)

Anand: Kepler end-to-end scenario:
Take a workflow definition file/create a new workflow
Bind that data(right now its a mock up input)
Run the workflow with OPM output enabled
==> Bertram: going via OPM or via "native" provenance?
Anand: currently using OPM; not clear whether that's enough..
==> need to study requirements, decide OPM, native, or combo

Biva: the OPM format of Taverna is lossy (not enough);
hence using native traces

OPM is missing e.g. link between gid ~ lid

How do we do publish step?
Variant I: create "publish button", given trace file, and local--global relations
Variant II: via code snippets / API
Not doing V-I now, but V-II

NEXT ACTIONS:

0. Looking at the benchmark queries, finalize DToL Schema of SPS (focus on what you need NOW; not for future versions)
==> Anand, Biva, Saumen to interact tightly (daily calls?)
1. Deploy global SPS database (Anand, Biva, Saumen) --> help from Daniel Zinn
2. Finish upload of traces (Kepler, Taverna, COMAD) to the local SPS
3. Upload local traces to global SPS
4. Answers as many queries as possible from "benchmark queries"
4.1. conceptually analyze what queries can be answered
4.2. exact SQL queries/stored procedure to answer the queries
4.3. result of SQL queries against test trace