/write-up-notes

Richard's Tasks for this Week:

Finish the write up, or at least make it much stronger than it currently is.
- Email the write up to David de Roure after going over the database with Karthik.
Digest what each paper says, sort them into subfolders and tags
- Position ourselves in relation to other current works.
- Write up notes for each relevant paper.
- Differentiate methods papers from workflow papers.
- Look at how Kepler is being used outside of myExperiment.
Do tier 2/3 workflows manually. Should take 5-10 minutes each.
- Do 300+ (as much as possible) Taverna ones. Time yourself.
Research the differences between T1 and T2.
- Implications for complexity?
Make slides for Berlin; email them out by Thursday.

Questions:

Everyone:
- Get subfolders on the new version of Mendeley
- Diachronic study on new workflows - is this possible?
Karthik:
- Whatever happened to that conference in Tennessee? Eden? Nimbus?
- How is the Methods in Ecology and Evolution paper is going.
- What can we wring out of Sparql? When can we meet?
Bill, what happened to SciencePipes?
Bertram:
- What about Galaxy? Did they ever get back to us?
- How much time do you have, what do you want to read? Shall I presort the papers for you/us?

Write up sections to be done:
http://dl.dropbox.com/u/4245277/Understanding%20Workflows.docx

- methods, teir 1/2/3, selection processes...
- Organise in terms of 2,3 tiers of information
      1: high level - author, when, how many, nodes, workflow links, goal
        2: of nodes, \% of data aquisition, \% of data sources, type of nodes, models?
           3: metadata sufficiency

How are we going to visualise this?

Notes, Hypotheses and Questions:

What gap are we filling? Some open questions about this:

Scaling > scaling ones are asumed to be more popular (Pegasus, cagrid)
automation > automating ones are more downloaded
provenance > More downloaded, more cited.
Visualisations > easier to follow (How does one test this?)
reproducible > Are myExperiment workflows cited more? Is myExperiment?
(Null hypotheis, tiers)
What users use these? Are they mostly project devs? What works?
Are we dependant only on downloads?
What about closed systems? Science Pipes? How are these different
Applications > Is this important? What field it is in?
Template > Or do they constantly reinvent the wheel? What about one offs?
Are they stand alone? Or are they only plumbing, not the whole system?
Less inpout over time > doesmore complexity lend itself ot more complexity in the input?
What are the weak points > reusint? Training?
Example ones - are these copied? Are templates carried thorugh culturally?
Sharing issues: what about privacy concerns? IRBs?
What's out there? is myExperiment a good enough show of the state of the art? Or are people using shell scripts and the r scripts? What does these suggest?
What about non-published work?
What's with subsetting, translation, transformation?
Does the information flowing through the workflow change anything? The input versus the output? Are any of these able to be linked easily?
Live constant visualisation should be more common as time goes on.
Human interactive components should be becoming more common. At the same time, workflows should be more able to have detached execution.
Embedded workflows should be more common.
Different workflow systems should be converging over time, both in possible stages, in application, and in natural language descriptions.
More applications should be considered over time. cf. http://pegasus.isi.edu/applications.php
There should be more results over time - not just in data, but in purposes.
Linear workflows should become less linear over time.
Information retrieval should become less common.
Workflows should in general be used by bioinformatics.
Downloads will be dependant on the amount of natural language descriptions.
There will be a proliferation of shims.
Most workflows will use the same repositories.
Most workflows integrate similar QA/QC steps.
Workflows will have a disparity in their use on public repositories and their function in the literature.
Workflows will decay with time.
Knowledge discovery worfklows will become more common with proliferation of databases.
Composite actors (which collapse details into abstract components) should proliferate.
More workflows will have smart reruns.
More workflows will have smart semantic links.
The content should, over time, more predictably alter the nature and course, shape, and tempo of the work flow.
Workflows should become reduced in their components - hwat used to be one workflow should become one segment.
Workflows will be more cited in publications.
Workflow visualisations will become more (less?) popular in the literature.
Parallel execution should become more common.
Non annotation-based approaches should become more common.
The type of workflow and application will influence the data provenance.
- "Workflow provenance refers to the record of the history of the derivation of some dataset in a scientific workflow. The amount of information recorded for workflow provenance varies, depending on application needs. For example, workflow provenance of a scientific result may include details about the type and model of external devices used, as well as the versions of softwares used for deriving the result."