DataOne Research Meeting 2: 05/25/2011 * BL: We should not just focus on Kepler, Taverna, but also other programs * BM: Interests: * the what, how, and why of tools for scientists, and how we can develop tools. * For instance, ecological niche modelling - lots of QA/QC, just to do a single experiment, and it would be nice to understand how complex these are in order to develop them further. * Would like to analyse workflows by: * Data Input * QA/QC steps * external models * iterative loops * recursion * subject matter (bioinformatics? ecology?) * Karthik - Experienced with models in R. * Would like to develop a laundry list of workflow dimensions, to find weak points across systems. * BL: So, we’re going to look at the workflows out there (such as those on myExperiment). If so, we have to focus on a few aspects: * Users: * What kind of users? * What are they trying to do? * Do they need hand-holding? How accessible are the systems? * Do they include R, or other external programs? How do they do this? * Are they using the same workflows, or constantly reinventing the wheel? * Workflows: * plumbing - data management, not the actual science workflows * use of shims * similar workflows, like the niche modelling types * Are the ones which do the ‘real’ work different from the others? * Is it 1-of-a-kind? * software development (those in production mode at the moment) * Depository systems: * Kepler Depository has some example ones, especially in packages, which could be mined * Find more examples * library of the Kepler project, which has a limited user interface * myExperiment is still the largest. * BL: “people might not want to share their workflows” * Useful in identifying users: would be worth * working with them directly * using the kepler mailing list (and others) to contact people * Some groups in Camera in San Diego, Davis which have in-house Workflows that we could ask for * Documentation: myExperiment, Kepler site training * Develop list of criteria (some above.) * BM has access to 1-2 page description of six work flows, and a summary article on workflow usage. * Brainstorming workflow usage * (Email the fellow interns with a status update, as per the mentor plan) * Heather is making a lab notebook (wordpress) * Make three excel sheets: * Users * Workflow languages (share with Karthik) * workflows themselves * Do the relevant reading loaded onto Mendeley * BL: What’s out there? What is the state of the art? R-scripts? Shell scripts? (Vistrails?) * workflows without calling them? is that too out of scope? What’s the outcome predicted for this project? * BM: Don’t want to start too broadly. Non-published work - how usable is that? * KR: Identify weak points - some are better, waht gaps are there, examples of non-working ones might be relevant findings? * Richard leads analysis: * randomly chosen subset? * x amount of environmental ones, as well (not just bioinformatics) * This will be covered more in another call * what ones are used (how complex they are) * What would analysis entail? * Karthik : * draft outline for short synthesis paper? * higher levels, strengths + weaknesses of programs (…and then suggestions) * goal of use, amount of use... * Note: ‘less formal ones’, two weeks ago in U Tenn at Nimbus - workflow systems in R - can these be gotten? * useful direction to take for next meeting * Taverna papers - BL’s friend, PhD student (Pãolo?) measurable complexity? Could join us? * complement the conceptual analysis * BL: Provenance repository internship going on as well - on avergage, 30% of workflows are shims. * subsetting, translation, transformation - useful infromation. * Ultimately, it would be useful to have an annoted bibliography. (Mendeley?) * Write up synopsis for communications to the public * Make a public drop-box folder * Email out to everyone the hours for next week (16:00 GMT/ 11:00 EST, 31-05-2011) * Fill out the mentor program for the next two weeks * 29th-30th in UC Davis (Buy Flights - 28th evening). * End of call.