August 14, 2014 Teleconference
 Provenance and Semantics Discussion with MsMTIP Team
 
Debbie Huntzinger, Christopher Schwalm, Josh Fisher, Steve Aulenbach, Yuanyuan Fang, Bertram Ludaescher, Paolo Missier, Dave Vieglais, Matt Jones, Chris Jones, Dan Ricciuto, Xixi Luo, Deborah McGuinness, Yaxing Wei, Santonu Goswami, Ben Leinfelder

not available:  Mark Schildhauer
 
Agenda
 
1.   Status:  Bob
    released MsMTIP model output:  
    
Data repository:
http://nacp.ornl.gov/mstmipdata/
           
2.  Semantics:  Yaxing and Christopher
 
Required variables
http://nacp.ornl.gov/MsTMIP_variables.shtml
 
Harmonize data files
mapping showing relationship between the provided variables and the requested / required data file
            https://docs.google.com/spreadsheets/d/17YAXpj1gu0g8Wi2SyNu90bUgy9OFALLQMlnK9DmE-BI/edit#gid=0

Use Cases:

A. Given a concept defined in an ontology, identify datasets/granules that contain variables that match that concept.
    -- exact match?
    -- partial match?
    -- goal is for people unfamilair with MSTMIP to locate measurements from a larger corpus of DataONE data that has overlapping but not exactly the same types of data
    I would like to see two grounded examples,  so perhaps measurement xx - fill in the xx, find granule yy  (and then it would be nice to know why dataset yy might be hard today to find)

B. Given a variable that appears in a granule, find other datasets that have the same granule as defined in a conceptual sense by alignment with ontologies

C. Classify model output variables according to the models (i.e, their embedded assumptions on real-world processes) used to produce them, and be able to differentiate the structural characteristics of the models used to output the variables (e.g., which components of NEE constitute any particular model output)

Deborah -- also need driving evaluation questions to determine if use cases are met
 
Clarification about some concepts
- data set and granule: In the MsTMIP, each variable produced in a simulation by a model is put into one netCDF file. We call it a granule. A data set is a set of granules. We call the whole MsTMIP output data repository a data set.
- simulations: MsTMIP defined 5 global simulations (RG1, BG1, SG1, SG2, and SG3). These simulations are different configurations of model runs. For example, BG1 needs a changing N-deposition input and SG3 uses a constant N-deposition input. See http://nacp.ornl.gov/MsTMIP_simulations.shtml for details.
Open question: can we characterize MsTMIP simulations and simulations defined by other MIPs so that their definitions can be represented in a consistent framework?

D.  Looking for data that can be used to address the question:  How would or did Above-ground-biomass change in the period 1950s to 1990s due to changes in relation to environmental drivers (N-deposition, fires, land use change)?
    -- Slightly less ambitious: Which data sets have a measure of Above-ground biomass change along with one of several drivers (N-deposition, fires, land use change)?
    Put this question in different scopes, it can be very different.
   
   
3.  Provenance

Types of uses cases related to provenance (maybe next call?) 
-- "outward facing": how do I document (e.g. in my papers, or data products), what I did?
=> show data lineage, workflow provenance etc.
-- "inward facing": I can't figure out what combination of inputs & params I used to generate this plot!@^%$  Help!
=> organize my "project histories"

Other opportunities: provenance/history can refine the semantics of a term: e.g., the "NEE that was produced according to workflow W1", ... 

Based on analysis of model outputs
    
example analysis of MsMTIP model output (N.B., albeit with an earlier version):

Huntzinger,  D. N. et al., The North American Carbon Program Multi-scale synthesis  and Terrestrial Model Intercomparison Project - Part 1: Overview and  experimental design, Geosci. Model Dev. Discuss., 6, 3977-4008          doi: 10.5194/gmd-6-2121-2013
http://dx.doi.org/10.5194/gmd-6-2121-2013    see Fig 5 and Fig. 6

Zscheischler et. al, 2014  Impact of Large-Scale Climate Extremes on Biospheric Carbon Fluxes: An Intercomparison Based on MsTMIP Data. Global Biogeochemical Cycles   28, 585-600.  doi: 10.1002/2014GB004826. 
http://dx.doi.org/10.1002/2014GB004826


Misc Notes:
Email from Deborah:
Xixi has made an attempt to map the 45 variables in a provided spreadsheet. Someone with domain knowledge will need to check it
I think we need to have some driving evaluation questions that for us exercise the semantics and for Bertram exercise the provenance. Do we have those?
Thx


Goal: 
To have well elaborated use case (ideally to cover both Semantics and Provenance) before the coming AHM.