Richard's Tasks for this Week:
- Finish the write up, or at least make it much stronger than it currently is.
- Email the write up to David de Roure after going over the database with Karthik.
- Digest what each paper says, sort them into subfolders and tags
- Position ourselves in relation to other current works.
- Write up notes for each relevant paper.
- Differentiate methods papers from workflow papers.
- Look at how Kepler is being used outside of myExperiment.
- Do tier 2/3 workflows manually. Should take 5-10 minutes each.
- Do 300+ (as much as possible) Taverna ones. Time yourself.
- Research the differences between T1 and T2.
- Implications for complexity?
- Make slides for Berlin; email them out by Thursday.
Questions:
- Everyone:
- Get subfolders on the new version of Mendeley
- Diachronic study on new workflows - is this possible?
- Karthik:
- Whatever happened to that conference in Tennessee? Eden? Nimbus?
- How is the Methods in Ecology and Evolution paper is going.
- What can we wring out of Sparql? When can we meet?
- Bill, what happened to SciencePipes?
- Bertram:
- What about Galaxy? Did they ever get back to us?
- How much time do you have, what do you want to read? Shall I presort the papers for you/us?
Write up sections to be done:
http://dl.dropbox.com/u/4245277/Understanding%20Workflows.docx
- methods, teir 1/2/3, selection processes...
- Organise in terms of 2,3 tiers of information
1: high level - author, when, how many, nodes, workflow links, goal
2: of nodes, \% of data aquisition, \% of data sources, type of nodes, models?
3: metadata sufficiency
- How are we going to visualise this?
Notes, Hypotheses and Questions:
What gap are we filling? Some open questions about this:
- Scaling > scaling ones are asumed to be more popular (Pegasus, cagrid)
- automation > automating ones are more downloaded
- provenance > More downloaded, more cited.
- Visualisations > easier to follow (How does one test this?)
- reproducible > Are myExperiment workflows cited more? Is myExperiment?
- (Null hypotheis, tiers)
- What users use these? Are they mostly project devs? What works?
- Are we dependant only on downloads?
- What about closed systems? Science Pipes? How are these different
- Applications > Is this important? What field it is in?
- Template > Or do they constantly reinvent the wheel? What about one offs?
- Are they stand alone? Or are they only plumbing, not the whole system?
- Less inpout over time > doesmore complexity lend itself ot more complexity in the input?
- What are the weak points > reusint? Training?
- Example ones - are these copied? Are templates carried thorugh culturally?
- Sharing issues: what about privacy concerns? IRBs?
- What's out there? is myExperiment a good enough show of the state of the art? Or are people using shell scripts and the r scripts? What does these suggest?
- What about non-published work?
- What's with subsetting, translation, transformation?
- Does the information flowing through the workflow change anything? The input versus the output? Are any of these able to be linked easily?
- Live constant visualisation should be more common as time goes on.
- Human interactive components should be becoming more common. At the same time, workflows should be more able to have detached execution.
- Embedded workflows should be more common.
- Different workflow systems should be converging over time, both in possible stages, in application, and in natural language descriptions.
- More applications should be considered over time. cf. http://pegasus.isi.edu/applications.php
- There should be more results over time - not just in data, but in purposes.
- Linear workflows should become less linear over time.
- Information retrieval should become less common.
- Workflows should in general be used by bioinformatics.
- Downloads will be dependant on the amount of natural language descriptions.
- There will be a proliferation of shims.
- Most workflows will use the same repositories.
- Most workflows integrate similar QA/QC steps.
- Workflows will have a disparity in their use on public repositories and their function in the literature.
- Workflows will decay with time.
- Knowledge discovery worfklows will become more common with proliferation of databases.
- Composite actors (which collapse details into abstract components) should proliferate.
- More workflows will have smart reruns.
- More workflows will have smart semantic links.
- The content should, over time, more predictably alter the nature and course, shape, and tempo of the work flow.
- Workflows should become reduced in their components - hwat used to be one workflow should become one segment.
- Workflows will be more cited in publications.
- Workflow visualisations will become more (less?) popular in the literature.
- Parallel execution should become more common.
- Non annotation-based approaches should become more common.
- The type of workflow and application will influence the data provenance.
- "Workflow provenance refers to the record of the history of the derivation of some dataset in a scientific workflow. The amount of information recorded for workflow provenance varies, depending on application needs. For example, workflow provenance of a scientific result may include details about the type and model of external devices used, as well as the versions of softwares used for deriving the result."