DataONE Working Group:
Exploration, Visualization, and Analysis (EVA)
Tuesday – Thursday
October 22 – 24, 2013
Tamaya Hotel
Santa Ana Pueblo, NM
Participants: Enrico Bertini, Bob Cook, Aritra Dasgupta, Bill Hargrove, Debbie Huntzinger, Nicolas Molen, Christopher Schwalm, Yaxing Wei, John Cobb, Katherine Chastain (guest), Soren Scott
Meeting Goals:
1. Provide updates on EVA and MsTMIP
2. Solicit feedback on quantitative tools for evaluating similarity
3. Continue discussions on critical evaluation of visualization methods for climate models
4. Discuss how provenance can benefit carbon modelers and solicit their needs for capturing and using provenance
5. Develop plans for IMIF, adding model benchmarking functionality into VisTrails and UV-CDAT for use in MsTMIP, ILAMB, and other modeling activities
6. Develop plans for EVA paper(s), Proposals, activities, and next meeting
Tuesday, October 22
Block 2
12:30 – 3:00 pm
Introductions, Meeting Goals, Agenda, and EVA Update (15 minutes) — Bob Cook
DataONE support will end but we should look for additional proposal opportunities to support continuing this effort.
What are some key activities that we could do that could be useful to to community?
How can we get those funded?
It would be good to be able to work within the DataONE umbrella for some of these efforts.
possible sponsors: NSF, DOE, NASA -- all have calls for the kinds of cyberinfrastructure that EVA does
Next face-to-face meeting (dates /places)
Would like to meet in the Spring.
Suggestion May.(or April) timeframe, Second half of May (preferred by Enrico),
Landscape ecology meeting is in Alaska in May. May 18-22 in Anchorage.
Need to avoid dates late in the Summer in order to complete billing before the end of DataONE Phase I
Venues:
Flagstaff (via Phoenix)
Oak Ridge (Knoxville)
Asheville, NC
New York (may be expensive, have had it there once before)
Integrated Model Intercomparison Framework (IMIF) —Yaxing Wei and Fei Du (Intern) (75 minutes)
A conceptual framwork that links all EVA activities together
Throw out some thoughts on what to do next
Discussions
Broker concept
Climate Explorer
sever side analysis
http://climexp.knmi.nl/start.cgi?id=someone@somewhere
Integration with 3rd party tools: GDAL
Quantitative tools to evaluate similarity of data (models and observations) — Enrico Bertini and Aritra Dasgupta (60 minutes)
Bill:
- Change data from 20 Junes to all months of selected years, we expect to see orbits/circles patterns
- We can then look at the tracks of "mass centers" of those orbits/circles
Christopher:
- Looking at 20 Junes still makes sense, since that's similar to looking at annual data
Debbie:
- Enable choice of multiple reference data / multiple model output variables / input v.s. output
Artria is targeting on a aproach study paper for a conference in the first week of December
DMESS 2014: http://www.climatemodeling.org/workshops/dmess2014/
Comments from Dan Ricciuto
I really like the idea of making the similarity tool interactive and letting the user set the variables, weighting, models, timescales, etc. I think it will be very useful in helping to identify outliers, model clusters, and/or trends, and interactivity will help us hone in on the causes of those by looking at specific variables, models and times. Especially if it can recompute the similarities rapidly. A useful addition might be to add the model ensemble mean as another “model” and use that as the center point. Then we can easily see which models differ from the mean, and knowing when, where and which variables can help modelers pinpoint the cause. If there is a way to connect this dissimilarity from the ensemble mean back to the model survey somehow, this would even be more useful.
Block 3 3:30 - 5:30
Carbon Flux Visualization Project — Nicolas Molen (20 minutes)
MongoDB for backend data management
GeoJSON for facet data access
D3 for client visualization
WebGL
forecast.io
Demo: spatial.mtri.org/flux
Quantitative tools to evaluate similarity of data (continued) — Enrico Bertini and Aritra Dasgupta
· Status and next steps for development
· Ideas for papers and proposals
Flexibility v.s. Speed
Balance between pre-processing data and on-the-fly data processing
Ideally, people shall apply their own cookie-cutter (region) to the data
Shall be able to select different temporal periods
Apply RMSD/RMSE besides Correlation
Prioritize: Exploration > Benchmarking
Target-based Exploration or Free-style Exploration
Suggestions on Exploration:
need to focus on a series of questions that are relevant for a 20 model ensemble
- Each time, we only look at one variable
- Do these models spatially agree with each other and where?
- Do these models temporally agree with each other and when?
- Compare models with emsemble mean
- Then ask why, need to be able to look to a series of variables (GPP, NPP, NEP, respiration), then regions, then over time
-freestyle exploration, rule-based
-use the tool to perform a series of experiments, e.g. I observe some weird things in year 2004, which is a El Nino year, is this the difference because of that reason?
Targets at AGU visualization
Within 2 weeks, we can have a non-interactive tool ready for review. We shall have a tool ready by AGU.
Name for the tool:
- ModelSimVis?
- Need beer
Plans for Wednesday
Poster Session and Reception
Wednesday, October 23
Block 4
8:30 -10:00 am
Critical analysis of climate visualization methods and Best Practices for complex visualization (Aritra Dasgupta)
· Examples of maps, line charts and scatter plots
· Paper
Color Advice for Maps: http://colorbrewer2.org/
Visualization usage scenario:
Analysis --> Exploratory Analysis
Design Consequences
Misinterpretation -> Interpretation
10:00 – 10:30 am Break
Block 5 10:30 - noon
Critical analysis of climate visualization methods and Best Practices for complex visualization—Aritra Dasgupta (continued)
colorbrewer2.org
MsTMIP Benchmarking Approach and Methods
— Christopher Schwalm (NAU) (60 minutes)
Block 6 1:00- 3:00 pm
Joint with Provenance Working Group
· Update on Provenance WG’s status and plans — Bertram, Paolo, David (30 minutes)
Bertram: collaborative provenance use case, facilited through DataONE and D-PROV
Paolo: PBase (provenance repository, querying, prov analysis), provarchitecture
David: iPython Notebook demo, RCloud
Bill:
- what data is tainted b/c of an earlier error
- what do I need to re-run, or create a "corrected" version
- geneaology (lineage) rather than provenance
- can we go towards antecedents vs descendents
- borrow ideas, tools, from geneaology software
- comparing multiple branches
- named relationships
John:
- reproducibility is hard. Big problems can't be easily repeated
- instead of identity, rather homomorphism (issue of mutable/immutable object)
Debbie:
- How to capture provenance in Matlab
- also interest in both Notebook framework and "rerun", as it is difficult to remember where to "resume" after a few months of break on the study.
· EVA Introduction — Bob Cook (15 minutes)
- MsTMIP overview
- inputs: weather, soil, phenology, land-cover, use, nitrogen, disturbance (all in one "model" / format)
- fairly homogenous data
- run through 20+ models
- MsTMIP defined a protocol to tell modelers how to run simulations. But models are different (in structure and implementation) and they are run on different machines
- Different people are using different tools to perform model evaluation activities, it's hard to switch from one tool to another
-
Provenance Representation for the National Climate Assessment in the Global Change Information System
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6558476#!
http://tw.rpi.edu/web/doc/tgrsgcis2013
Yaxing:
- Model runs go into a Local Data Repository
- Broker from EarthCube, then into ..
.. UV-CDAT, VT (Analysis Module Library, Visualization Library)
- To the side: PBase/DataONE and ESGF
Discussion (60 minutes)
· Carbon Cycle modelers needs on capturing and using provenance
· Integration of EVA and Provenance tools to better meet user community’s needs
example of a data tracing effort for data publication: Bob mentiones a new Nature Journal Scientific Data. www.nature.com/scientificdata/
From Steve Aulenbach
http://www.slideshare.net/pskomoroch/distilling-data-exhaust
Action Items:
- Share provenance query use cases
- Arrange a joint telecon
Block 7 3:30 -5:30 pm
MsTMIP Benchmarking Approach and Methods
— Christopher Schwalm (NAU) (continued)
Quantitative tools for benchmarking models — Aritra Dasgupta (30 minutes)
Critical analysis of climate visualization methods and Best Practices for complex visualization (Aritra Dasgupta)
· Examples of maps, line charts and scatter plots
· Paper
example: www.highcharts.com/demo/area-stacked
Discussion
5:30 pm Meeting Adjourns for the day
Thursday, October 24
Block 8
8:30- 10:00 am
Advanced Visualizations of Model-Model Intercomparisons, using MsTMIP data — Debbie (30 minutes)
Discussion
· Static Visualization for papers
· Dynamic Visualizations for exploration and oral presentations
ANOVA
http://en.wikipedia.org/wiki/Analysis_of_variance
Access to MsTMIP Driver Data:
ftp://nacp.ornl.gov/synthesis/2009/frescati/
Debbie (NEE analysis)
- anova using Transcom regions
- factor space-- changes in NEE due to each factor (incrementally and plot in phase space
Aritra
- will try changes in NEE (factor space) in parallel coorinates
Block 9
10:30 - noon
Discussion and Next Steps:
· Development of Proposals and Papers
papers
1. visualization critique
finalize taxonomy / review (15 figures): next week
first draft paper: 15 November or earlier
targeted for TVCG
2. Design Study--EuroVis
time/space similarity analysis tool
due: 6 December
prototype:
mockup teleconference-- 30 October
functioning prototype - 8 November telecon
feedback from scientists: iterations beginning on 8 November
3. DMESS paper
DMESS 2014: http://www.climatemodeling.org/workshops/dmess2014/
on tool 1: multi-projection
telecon to view the prototype: early November
draft November 22
final December 15
4. Aritra to India
7 December
MsTMIP visualizations
have bi-weekly phone calls
Sources for solicitation
DOE, NASA, NSF (cyber and vis(???))
solicitations for Vis science (check with Enrico)
Google
maybe NSF
· others
Prepare for final reporting during AHM closing session
· Adjourn
Block 10
1:00 – 3:00 pm Continued Informal Discussions as needed