DataONE Working Group: Exploration, Visualization, and Analysis (EVA) Tuesday – Thursday October 22 – 24, 2013 Tamaya Hotel Santa Ana Pueblo, NM Participants: Enrico Bertini, Bob Cook, Aritra Dasgupta, Bill Hargrove, Debbie Huntzinger, Nicolas Molen, Christopher Schwalm, Yaxing Wei, John Cobb, Katherine Chastain (guest), Soren Scott Meeting Goals: 1. Provide updates on EVA and MsTMIP 2. Solicit feedback on quantitative tools for evaluating similarity 3. Continue discussions on critical evaluation of visualization methods for climate models 4. Discuss how provenance can benefit carbon modelers and solicit their needs for capturing and using provenance 5. Develop plans for IMIF, adding model benchmarking functionality into VisTrails and UV-CDAT for use in MsTMIP, ILAMB, and other modeling activities 6. Develop plans for EVA paper(s), Proposals, activities, and next meeting Tuesday, October 22 Block 2 12:30 – 3:00 pm Introductions, Meeting Goals, Agenda, and EVA Update (15 minutes) — Bob Cook DataONE support will end but we should look for additional proposal opportunities to support continuing this effort. What are some key activities that we could do that could be useful to to community? How can we get those funded? It would be good to be able to work within the DataONE umbrella for some of these efforts. possible sponsors: NSF, DOE, NASA -- all have calls for the kinds of cyberinfrastructure that EVA does Next face-to-face meeting (dates /places) Would like to meet in the Spring. Suggestion May.(or April) timeframe, Second half of May (preferred by Enrico), Landscape ecology meeting is in Alaska in May. May 18-22 in Anchorage. Need to avoid dates late in the Summer in order to complete billing before the end of DataONE Phase I Venues: Flagstaff (via Phoenix) Oak Ridge (Knoxville) Asheville, NC New York (may be expensive, have had it there once before) Integrated Model Intercomparison Framework (IMIF) —Yaxing Wei and Fei Du (Intern) (75 minutes) A conceptual framwork that links all EVA activities together Throw out some thoughts on what to do next Discussions Broker concept Climate Explorer sever side analysis http://climexp.knmi.nl/start.cgi?id=someone@somewhere Integration with 3rd party tools: GDAL Quantitative tools to evaluate similarity of data (models and observations) — Enrico Bertini and Aritra Dasgupta (60 minutes) Bill: - Change data from 20 Junes to all months of selected years, we expect to see orbits/circles patterns - We can then look at the tracks of "mass centers" of those orbits/circles Christopher: - Looking at 20 Junes still makes sense, since that's similar to looking at annual data Debbie: - Enable choice of multiple reference data / multiple model output variables / input v.s. output Artria is targeting on a aproach study paper for a conference in the first week of December DMESS 2014: http://www.climatemodeling.org/workshops/dmess2014/ Comments from Dan Ricciuto I really like the idea of making the similarity tool interactive and letting the user set the variables, weighting, models, timescales, etc. I think it will be very useful in helping to identify outliers, model clusters, and/or trends, and interactivity will help us hone in on the causes of those by looking at specific variables, models and times. Especially if it can recompute the similarities rapidly. A useful addition might be to add the model ensemble mean as another “model” and use that as the center point. Then we can easily see which models differ from the mean, and knowing when, where and which variables can help modelers pinpoint the cause. If there is a way to connect this dissimilarity from the ensemble mean back to the model survey somehow, this would even be more useful. Block 3 3:30 - 5:30 Carbon Flux Visualization Project — Nicolas Molen (20 minutes) MongoDB for backend data management GeoJSON for facet data access D3 for client visualization WebGL forecast.io Demo: spatial.mtri.org/flux Quantitative tools to evaluate similarity of data (continued) — Enrico Bertini and Aritra Dasgupta · Status and next steps for development · Ideas for papers and proposals Flexibility v.s. Speed Balance between pre-processing data and on-the-fly data processing Ideally, people shall apply their own cookie-cutter (region) to the data Shall be able to select different temporal periods Apply RMSD/RMSE besides Correlation Prioritize: Exploration > Benchmarking Target-based Exploration or Free-style Exploration Suggestions on Exploration: need to focus on a series of questions that are relevant for a 20 model ensemble - Each time, we only look at one variable - Do these models spatially agree with each other and where? - Do these models temporally agree with each other and when? - Compare models with emsemble mean - Then ask why, need to be able to look to a series of variables (GPP, NPP, NEP, respiration), then regions, then over time -freestyle exploration, rule-based -use the tool to perform a series of experiments, e.g. I observe some weird things in year 2004, which is a El Nino year, is this the difference because of that reason? Targets at AGU visualization Within 2 weeks, we can have a non-interactive tool ready for review. We shall have a tool ready by AGU. Name for the tool: - ModelSimVis? - Need beer Plans for Wednesday Poster Session and Reception Wednesday, October 23 Block 4 8:30 -10:00 am Critical analysis of climate visualization methods and Best Practices for complex visualization (Aritra Dasgupta) · Examples of maps, line charts and scatter plots · Paper Color Advice for Maps: http://colorbrewer2.org/ Visualization usage scenario: Analysis --> Exploratory Analysis Design Consequences Misinterpretation -> Interpretation 10:00 – 10:30 am Break Block 5 10:30 - noon Critical analysis of climate visualization methods and Best Practices for complex visualization—Aritra Dasgupta (continued) colorbrewer2.org MsTMIP Benchmarking Approach and Methods — Christopher Schwalm (NAU) (60 minutes) Block 6 1:00- 3:00 pm Joint with Provenance Working Group · Update on Provenance WG’s status and plans — Bertram, Paolo, David (30 minutes) Bertram: collaborative provenance use case, facilited through DataONE and D-PROV Paolo: PBase (provenance repository, querying, prov analysis), provarchitecture David: iPython Notebook demo, RCloud Bill: - what data is tainted b/c of an earlier error - what do I need to re-run, or create a "corrected" version - geneaology (lineage) rather than provenance - can we go towards antecedents vs descendents - borrow ideas, tools, from geneaology software - comparing multiple branches - named relationships John: - reproducibility is hard. Big problems can't be easily repeated - instead of identity, rather homomorphism (issue of mutable/immutable object) Debbie: - How to capture provenance in Matlab - also interest in both Notebook framework and "rerun", as it is difficult to remember where to "resume" after a few months of break on the study. · EVA Introduction — Bob Cook (15 minutes) - MsTMIP overview - inputs: weather, soil, phenology, land-cover, use, nitrogen, disturbance (all in one "model" / format) - fairly homogenous data - run through 20+ models - MsTMIP defined a protocol to tell modelers how to run simulations. But models are different (in structure and implementation) and they are run on different machines - Different people are using different tools to perform model evaluation activities, it's hard to switch from one tool to another - Provenance Representation for the National Climate Assessment in the Global Change Information System http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6558476#! http://tw.rpi.edu/web/doc/tgrsgcis2013 Yaxing: - Model runs go into a Local Data Repository - Broker from EarthCube, then into .. .. UV-CDAT, VT (Analysis Module Library, Visualization Library) - To the side: PBase/DataONE and ESGF Discussion (60 minutes) · Carbon Cycle modelers needs on capturing and using provenance · Integration of EVA and Provenance tools to better meet user community’s needs example of a data tracing effort for data publication: Bob mentiones a new Nature Journal Scientific Data. www.nature.com/scientificdata/ From Steve Aulenbach http://www.slideshare.net/pskomoroch/distilling-data-exhaust Action Items: - Share provenance query use cases - Arrange a joint telecon Block 7 3:30 -5:30 pm MsTMIP Benchmarking Approach and Methods — Christopher Schwalm (NAU) (continued) Quantitative tools for benchmarking models — Aritra Dasgupta (30 minutes) Critical analysis of climate visualization methods and Best Practices for complex visualization (Aritra Dasgupta) · Examples of maps, line charts and scatter plots · Paper example: www.highcharts.com/demo/area-stacked Discussion 5:30 pm Meeting Adjourns for the day Thursday, October 24 Block 8 8:30- 10:00 am Advanced Visualizations of Model-Model Intercomparisons, using MsTMIP data — Debbie (30 minutes) Discussion · Static Visualization for papers · Dynamic Visualizations for exploration and oral presentations ANOVA http://en.wikipedia.org/wiki/Analysis_of_variance Access to MsTMIP Driver Data: ftp://nacp.ornl.gov/synthesis/2009/frescati/ Debbie (NEE analysis) * anova using Transcom regions * factor space-- changes in NEE due to each factor (incrementally and plot in phase space Aritra * will try changes in NEE (factor space) in parallel coorinates Block 9 10:30 - noon Discussion and Next Steps: · Development of Proposals and Papers papers 1. visualization critique finalize taxonomy / review (15 figures): next week first draft paper: 15 November or earlier targeted for TVCG 2. Design Study--EuroVis time/space similarity analysis tool due: 6 December prototype: mockup teleconference-- 30 October functioning prototype - 8 November telecon feedback from scientists: iterations beginning on 8 November 3. DMESS paper DMESS 2014: http://www.climatemodeling.org/workshops/dmess2014/ on tool 1: multi-projection telecon to view the prototype: early November draft November 22 final December 15 4. Aritra to India 7 December MsTMIP visualizations have bi-weekly phone calls Sources for solicitation DOE, NASA, NSF (cyber and vis(???)) solicitations for Vis science (check with Enrico) Google maybe NSF · others Prepare for final reporting during AHM closing session · Adjourn Block 10 1:00 – 3:00 pm Continued Informal Discussions as needed