
Notes from November 2011 EVA Working Group:

Slides and documents to be shared:
    -- https://repository.dataone.org/documents/Meetings/20120403_DataONE_EVA_Boulder/
Tuesday, April 3
Jeff Morisette, Rosie Fisher, Anna Michalak, Yaxing, Christopher, Steve, Shishi, Dave Lawrence, Vineet, Bertram, Suresh, Jorge, Claudio, Bob, and Matt

Introduction & Meeting Goals by Bob Cook
MsTMIP & Benchmarking by Christopher Schwalm
Q: Has anyone used continuous integration testing frameworks for driving these evaluation benchmarks?

Model Intercomparison Framework by Shishi Liu
ILAMB overview by Dave Lawrence
Afternoon Breakout Session:
Two groups, overall process of Model Intercomparison Infrastructure

1. Select output to analyze (Observation/Model Output)
2. Convert to common representation
3. Intercomparison
4. Scoring System

Vegetation (Rosie, Suresh, Christopher, Steve, Bertram, Anna, Jeff, Jorge)


Wednesday, April 4, 2012
participating:  Yaxing, Christopher, Steve, Shishi, Dave Lawrence, Vineet, Bertram, Suresh, Jorge, Claudio, Bob, and Matt

DataONE linkages with other activities
NSF OCI:  other DataNets; goal is to link each of the DataNets
NSF-Geo:  EarthCube--in planning phase
NASA:  Earth systems DAACs (similar to ORNL DAAC)
DOE:  Earth Systems Grid Federation is being constructed; DataONE may want to contact ESGF in a year or two

European Union
    many efforts funded by EU to link across the Atlantic (with NSF)
    CReATIVE-B (a biodiversity network)
Global Efforts by the EU

Pilot Workflow
based on discussion from yesterday, providing more detail for each of the workflow steps
1.  PARC:  First element of the Workfow

PARC = Project, Aggregate, Reproject, and Clip

Read Metadata:  temporal coverage, time step, units, spatial metadata
Regrid:  extract data values; spatial and temporal regridding
Aggregate:  avarage, single pixel extraction
    has inputs in addition to those for PARC
        inputs to Clip:  land cover, ecoregions, climate regimes
            need to evaluate function and define thresholds
    outputs from Clip would be by region (e.g., North America), subregion, 

Post processing / another element of the workflow
    zonal values (averages)

PARC Inputs
Temporal & Spatial metadata
Observation of interest:  T-avg or ET
Spatial Characteristics
Derived data

PARC Outputs:  

User interface
    diagnostic package
    scripting language (python, R for metrics)
    ILAMB -- decided that metrics would be in R
Visualization by Claudio Silva
Overview talk of visualization capabilities
Note from Matt: also of interest is how to go up and down levels of abstraction in trying to both visualize and understand phenomena; see in particular:
    -- http://worrydream.com/LadderOfAbstraction/
a brief history of visualization:

Actions/papers / proposals
1.  DataONE to merge UV-CDAT visualization exploration, analysis functionality into its data access ( Claudio (lead), Matt, Suresh, Giri)
    Pilot:  use DatONE data access linked to UVCDAT functionality
        take advantage of the ORNL DAAC's collection of netCDF data files in a THREDDS data server / data catalog
        Daymet:  http://daymet.ornl.gov/thredds/catalog/allcf/catalog.html
        ORNL DAAC:  http://thredds.daac.ornl.gov/thredds/ornl_catalog/daac.html

2.  Pilot project:  use DataONE / ILAMB / MsTMIP resources to build a pilot, based on the Tuesday afternoon exemplars  (Matt (co-lead), Bob (co-lead), Suresh, and others)    

3.  Paper:  using UVCDAT to visualize / explore / analyze climate or land model data (Claudio (co-lead), Debbie, Christopher (co-lead), Anna, Forrest, Bob, and others)

4.  Paper:  application of workflow technology, archival and  provenance for the ILAMB metrics (based on #2)  (are there other activities that could also be used here?) (many, including Bertram (lead), Dave, Matt, Christopher, Claudio (co-lead), Vineet, Steve, and Bob)
    4.a.  maybe a related paper devoted to provenance (Bertram and Yaxing)

5.  Paper:  regridding and its uncertainty (Jorge (lead) , Shishi, Yaxing, Suresh, and Christopher (potentially Anna))

6.  Proposal:  build the pilot ILAMB workflow and determine how to construct a proposal to build the pilot to a fully functioning product / tool (TBD)
    -possibly including sensitivity of metrics, how sensitive are the metrics to how they are calculated in the workflow (more than just a single number for the ranking / score; having an uncertainty on the ranking; how scores for individual outputs are weighted to come up with the total score)

7.  Summer Intern Projects (at Standford and at ORNL) (summer 2012) (TBD)

8.  Integration and leveraging of Anna's and Vineet's SI2 data assimilation project with DataONE EVA's activities (perhaps via VisTrails).  Need to work out some links between code and ftp / HPC resources.


Wednesday, April 5

Bertram's talk
    developing DataONE open provenance model (D-OPM)

Example Use Cases from around the room

Kepler workflow that uses a Web service to generate MODIS subsets--subsets are based on a time variable source 

- Yaxing:  Land cover develped by Martin Jung is based on an analysis of multiple input land cover data sets

Multiple high-level steps were involved:
Step 1: Original SYNMAP data (1km, global, one year ~2000) was derived from MODIS land cover, GLC2000, and GLCC. In this step, leaf type and leaf longevity information from AVHRR-CFTC are major auxiliary data. The detailed workflow of step 1 is described in text format in a paper published by Marin Jung.
Step 2: Original SYNMAP data was reprojected, reformated, regridded into Analyzed SYNMAP data (0.5 degree, global, one year ~2000). The detailed workflow of step 2 is organized as a workflow model in ESRI ArcGIS.
Step 3: Analyzed SYNMAP data was harmonized with Hurtt's land use change data to derive global Land Use Change data based on SYNMAP PFT types for MsTMIP project (0.5 degree, global, yearly 1700-2010). Step 3 was done in a Matlab program.

Challenges for this use case: drill into each step to get the sub-workflow, then integrate these heterogeneous sub-workflow together.

MODIS data from an early version (Version 4) were used to draw conclusions about the relationship between greening and drought; other workers using a later version of MODIS data (Version 5) were unable to reproduce the results using Version 4 data.  Part of that problem was the ways in which the data were processed / aggregated in the earlier study were not documented so that the later workers could re-do the analysis

Steve Aulenbach
-IGBP 59

Major use case: