Richard's Tasks for this Week:

 * Finish the write up, or at least make it much stronger than it currently is. 
   * Email the write up to David de Roure after going over the database with Karthik.
 * Digest what each paper says, sort them into subfolders and tags
   * Position ourselves in relation to other current works. 
   * Write up notes for each relevant paper. 
   * Differentiate methods papers from workflow papers.
   * Look at  how Kepler is being used outside of myExperiment. 
 * Do tier 2/3 workflows manually. Should take 5-10 minutes each. 
   * Do 300+ (as much as possible) Taverna ones. Time yourself. 
 * Research the differences between T1  and T2.
   * Implications for complexity?
 * Make slides for Berlin; email them out by Thursday.

Questions:

 * Everyone: 
   * Get subfolders on the new version of Mendeley
   * Diachronic study on new  workflows - is this possible?
 * Karthik:
   * Whatever happened to  that conference in Tennessee? Eden? Nimbus?
   * How is the Methods in Ecology and Evolution paper  is going.
   * What can we wring out of Sparql? When can we meet?
 * Bill, what happened to SciencePipes?
 * Bertram: 
   * What about Galaxy?  Did they ever get back to us?
   * How much time do you have, what do you want to read? Shall I presort the papers for you/us?

Write up sections to be done:
http://dl.dropbox.com/u/4245277/Understanding%20Workflows.docx

- methods, teir 1/2/3, selection  processes...
- Organise in terms of 2,3 tiers of  information 
      1: high level - author,  when,  how many, nodes, workflow links, goal
        2: of  nodes, \% of data  aquisition, \% of data sources, type of nodes,  models?
           3: metadata sufficiency
 *     How are we going to visualise this? 

Notes, Hypotheses and Questions: 

What gap are we filling? Some open questions about this: 
 * Scaling > scaling ones are asumed to be more popular (Pegasus, cagrid)
 * automation > automating ones are more downloaded
 * provenance > More downloaded, more cited.
 * Visualisations > easier to follow (How does one test this?)
 * reproducible > Are myExperiment workflows cited more? Is myExperiment?
 * (Null hypotheis, tiers)
 * What users use these? Are they mostly project devs? What works?
 * Are we dependant only on downloads?
 * What about closed systems? Science Pipes? How are these different
 * Applications > Is this important? What field it is in?
 * Template > Or do they constantly reinvent the wheel? What about one offs?
 * Are they stand alone? Or are they only plumbing, not the whole system?
 * Less inpout over time > doesmore complexity lend itself ot more complexity in the input?
 * What are the weak points > reusint? Training? 
 * Example ones - are these copied? Are templates carried thorugh culturally?
 * Sharing issues: what about privacy concerns? IRBs?
 * What's out there? is myExperiment a good enough show of the state of the art? Or are people using shell scripts and the r scripts? What does these suggest?
 * What about non-published work?
 * What's with subsetting, translation, transformation?
 * Does the information flowing through the workflow change anything? The input versus the output? Are any of these able to be linked easily? 
 * Live constant visualisation should be more common as time goes on.
 * Human interactive components should be becoming more common. At the same time, workflows should be more able to have detached execution. 
 * Embedded workflows should be more common. 
 * Different workflow systems should be converging over time, both in possible stages, in application, and in natural language descriptions. 
 * More applications should be considered over time. cf. http://pegasus.isi.edu/applications.php
 * There should be more results over time - not just in data, but in purposes. 
 * Linear workflows should become less linear over time. 
 * Information retrieval should become less common. 
 * Workflows should in general be used by bioinformatics. 
 * Downloads will be dependant on the amount of natural language descriptions. 
 * There will be a proliferation of shims.
 * Most workflows will use the same repositories. 
 * Most workflows integrate similar QA/QC steps.
 * Workflows will have a disparity in their use on public repositories and their function in the literature. 
 * Workflows will decay with time. 
 * Knowledge discovery worfklows will become more common with proliferation of databases. 
 * Composite actors (which collapse details into abstract components) should proliferate.
 * More workflows will have smart reruns. 
 * More workflows will have smart semantic links. 
 * The content should, over time, more predictably alter the nature and course, shape, and tempo of the work flow. 
 * Workflows should become reduced in their components - hwat used to be one workflow should become one segment. 
 * Workflows will be more cited in publications.
 * Workflow visualisations will become more (less?) popular in the literature. 
 * Parallel execution should become more common. 
 * Non annotation-based approaches should become more common. 
 * The type of workflow and application will influence the data provenance. 
   * "Workflow provenance refers to the record of the history  of the derivation of some dataset in a scientific workflow. The amount  of information recorded for workflow provenance varies, depending on  application needs. For example, workflow provenance of a scientific  result may include details about the type and model of external devices  used, as well as the versions of softwares used for deriving the result."