/leqbe0mJwb

DataOne Research Meeting 2: 05/25/2011

BL: We should not just focus on Kepler, Taverna, but also other programs
BM: Interests:
- the what, how, and why of tools for scientists, and how we can develop tools.
- For instance, ecological niche modelling - lots of QA/QC, just to do a single experiment, and it would be nice to understand how complex these are in order to develop them further.
- Would like to analyse workflows by:
  - Data Input
  - QA/QC steps
  - external models
  - iterative loops
  - recursion
  - subject matter (bioinformatics? ecology?)
Karthik - Experienced with models in R.
- Would like to develop a laundry list of workflow dimensions, to find weak points across systems.
BL: So, we’re going to look at the workflows out there (such as those on myExperiment). If so, we have to focus on a few aspects:
- Users:
  - What kind of users?
  - What are they trying to do?
  - Do they need hand-holding? How accessible are the systems?
  - Do they include R, or other external programs? How do they do this?
  - Are they using the same workflows, or constantly reinventing the wheel?
- Workflows:
  - plumbing - data management, not the actual science workflows
  - use of shims
  - similar workflows, like the niche modelling types
  - Are the ones which do the ‘real’ work different from the others?
  - Is it 1-of-a-kind?
  - software development (those in production mode at the moment)
- Depository systems:
  - Kepler Depository has some example ones, especially in packages, which could be mined
  - Find more examples
  - library of the Kepler project, which has a limited user interface
  - myExperiment is still the largest.
  - BL: “people might not want to share their workflows”
  - Useful in identifying users: would be worth
    - working with them directly
    - using the kepler mailing list (and others) to contact people
    - Some groups in Camera in San Diego, Davis which have in-house Workflows that we could ask for
Documentation: myExperiment, Kepler site training
- Develop list of criteria (some above.)
- BM has access to 1-2 page description of six work flows, and a summary article on workflow usage.
- Brainstorming workflow usage
- (Email the fellow interns with a status update, as per the mentor plan)
- Heather is making a lab notebook (wordpress)
- Make three excel sheets:
  - Users
  - Workflow languages (share with Karthik)
  - workflows themselves
- Do the relevant reading loaded onto Mendeley
BL: What’s out there? What is the state of the art? R-scripts? Shell scripts? (Vistrails?)
- workflows without calling them? is that too out of scope? What’s the outcome predicted for this project?
BM: Don’t want to start too broadly. Non-published work - how usable is that?
KR: Identify weak points - some are better, waht gaps are there, examples of non-working ones might be relevant findings?
Richard leads analysis:
- randomly chosen subset?
- x amount of environmental ones, as well (not just bioinformatics)
  - This will be covered more in another call
- what ones are used (how complex they are)
- What would analysis entail?
Karthik :
- draft outline for short synthesis paper?
- higher levels, strengths + weaknesses of programs (…and then suggestions)
  - goal of use, amount of use...
- Note: ‘less formal ones’, two weeks ago in U Tenn at Nimbus - workflow systems in R - can these be gotten?
- useful direction to take for next meeting
Taverna papers - BL’s friend, PhD student (Pãolo?) measurable complexity? Could join us?
- complement the conceptual analysis
BL: Provenance repository internship going on as well - on avergage, 30% of workflows are shims.
- subsetting, translation, transformation - useful infromation.
Ultimately, it would be useful to have an annoted bibliography. (Mendeley?)
Write up synopsis for communications to the public
Make a public drop-box folder
Email out to everyone the hours for next week (16:00 GMT/ 11:00 EST, 31-05-2011)
Fill out the mentor program for the next two weeks
29th-30th in UC Davis (Buy Flights - 28th evening).
End of call.