SONet Meeting, Santa Barbara, 2011-04-18 - 2011-04-20
=====================================================


Introduction from Mark Schildhauer
----------------------------------

    
Agenda: https://sonet.ecoinformatics.org/workshops/jwg-meeting-2011-04
    
        Deborah votes for trying to do this!
I. Work to Date: Demonstrations
===============================

-----------------
Andy Maffei
-----------
Deborah McGuinness
------------------
            - reduced queries from 8 to 3 steps on average
            - expose data that wasn't available prior
            - validate and augment data
        - Challenges:
            - Expose provenance (SPCDIS, PML)
            - encourage reusability, what is needed?
            - support for modularity (Semantic eScience Framework
        - VSTO ontology
            - areas for reuse: DataProduct, Parameter, Instrument 
            classes
        - VSTO infrastructure revolves around queries based on 
        Instrument, Dates, and Parameters supported by 
        ontology classes and semantic filters, metadata 
        services, and data services; now has triple store-based 
        implementation
            - Ben: How do you select class names? 
            - Deborah: Domain scientists provide them, and they 
            are accurate to greater and lesser degrees. Create 
            class labels based on what the scientists use in the 
            domain.
                -  Our work in shipboard
                eventlogging has shown
                us that it is VERY important to record both
                the local name for an instrument and the
                mapping to a controlled vocabulary for
                instruments. (Maffei)
        - VSTO needed to capture provenance information better
        in order to be truly useful to the researchers. 
        Follow-on project (SPCDIS) is addressing this. 
        - VSTO Web portal: how to present information. Starts with 
        an inferred plot type initially, based on options and 
        suggestions encoded into the ontology
        - Ongoing work: 
            - Semantic eScience framework (SeSF)
            - Semantic Provenance Capture in data Ingest Systems 
            (SPCDIS)
            - Proof Markup Language is now Provenance Markup 
            Language
            Inference Web - Provenance Infrastructure effort:  
            http://inference-web.org/
            One can get to the PML documentation and ontologies from this  - documentation page at http://inference-web.org/wiki/Documentation
            has previous and current versions.  PML 2 is the modularized version
            - W3C Working Group has started and work will likely 
            focus on a solution based on concepts found in both 
            PML and Open Provenance Model (OPM), see OPMV also.
            - InferenceWeb toolkit: testing the accuracy of 
            results using 
            inferencing, evaluating the rules used to come to a 
            particular outcome. Highlighting inconsistencies 
            provides information on the trustworthiness of 
            sources (instruments, informants, etc.)
            - AOWG: Annotation Ontology Working Group has 
            recently started at the W3C from an incubator
            - Semantically-enhanced Linked Data
            - Data Quality Screening System ??
        - Discussion of other work
            - Ontolog - upper level, foundationally oriented 
            ontology work, Peter Yim, working on continued funding
            - BioPortal, bio-medical ontologies
            - Challenge for DataNET type projects only includes 
            small budgets for semantic work. Shawn: SONet is here 
            to facilitate integration across knowledge 
            representation efforts.
            
            Provenance working group http://www.w3.org/2011/prov/wiki/Main_Page
            

Hilmar Lapp
-----------
Ruth Duerr
----------

Semantic Sea Ice Interoperability Initiative
Libre: Freeing your data: free to share, discover and use
Deborah: add Technology Infusion link from NASA
http://www.esdswg.org/techinfusion/about/


Jeff Horsburgh
--------------
Carl Lagoze
-----------
II. Session on a Comparative Ontology Review
==============================================

Shawn: It would be useful to coordinate efforts on using domain extensions to ontologies (units being one of the more important areas). OBO Foundry provides a central place for ontologies, but also for units, promoting adoption and not reinventing components.

Hilmar: O&M is a 'common model' that is now a standard. 
Others disagree that it is 'common' because it may be deficient in certain use cases.
Matt: O&M is a small part of the model, i.e. GML dependencies, sensorML serialization, etc.
Jeff: O&M 2.0 is reducing those dependencies.

O&M Model - Philip Dibner
-------------------------
Discussion: Mark: Temporal Period and Instant are explicit in the O&M model: what about spatial context? Features may have a location. Units are in the Value class. (Flip will confirm this ...)

O&M Feature maps to OBOE Entity
O&M Process maps to OBOE Protocol

Deborah: Wolfram Alpha is a commercial, semantically-enabled search offering. It would be worth comparing it with other efforts. See Units, Measures, and Physical Quantities on the Wolfram site.
http://www.wolframalpha.com/examples/PhysicalQuantities.html
also a talk on this will happen next week
http://cirss.lis.illinois.edu/Rtable/errt.html
Deborah and/or one or more of her team will attend virtually

Matt: Capturing complex units of measure is fundamental, especially for units that end up being dimensionless due to the ratio nature of certain observations. e.g ppm (parts per million of what?) Knowing the units of the medium is critical.  Ratio units must not lose the context of both the enumerator and the denominator.

What units are being used? 
Shawn: OBOE has unit classes, some of which come from the LTER unit dictionary, which derives from the EML unit dictionary.
Mark: Highly composited indices are being calculated in the ecology community that have complex units hidden behind the index value.
Andy: BCO-DMO uses units described here -- http://physics.nist.gov/cuu/Units/
VSTO (Virtual Solar Terrestrial Ontology)
----
Matt: Note on raster image use case - pulling pixel values from all MODIS data scenes for analysis of bird and habitat correlation (a DataONE use case) overwhelmed the MODIS data service, which highlights the fact that the underlying data model may affect the usability of the data (or lack thereof) in repurposed application that differ from the original intent of the service.

A VSTO Parameter 
  equates to 
an OBOE Characteristic 
  equates to 
an ODM Variable
  equates to
an O&M Property
  equates to
an EQ Quality

ODM Model (CUAHSI Object Data Model)
---------
Note: There is a tradition of focusing on time series information in the community using the ODM, and the model reflects that currently

Note: CUAHSI HIS is being enhanced to implement the DataONE Member Node interface, and the main barrier is deciding how to partition stream data into distinct data objects with byte lengths and checksum values. Further complications occur when researchers change data values in the underlying relational database implementing the HIS. USGS water data at times exhibits this problem (same query at two different dates gives different results). This points to tracking provenance information, and Ruth suggests that these issues be broght up at ESIP meetings, since they are grappling with the issues currently.

OBOE (Extensible Observation Ontology)
----
Shawn: in comparing O&M and OBOE, O&M doesn't have an Observation class as it is defined in OBOE. An O&M Observation equates to an OBOE Measurement.  In OBOE, Observations have one or more measurements, and may be seen as a collection of Measurements


Mark made the point that it would be useful to document the 3 use-cases that were mentioned in the mornings discussions:
III. Discussion on Next Steps
=============================
Shawn: One of the commonalities among models is the concept of 'Property' (in O&M parlance) and we should leverage these types of linkages toward integrating data

One thing we might do is take a look at the models and see if we can see if we can agree on descriptions for concepts that seem to be "same-as" in the ontologies we have looked at.

        Huiping Cao (former NCEAS postdoc) has already written a report that compares O&M, OBOE, and EQ ontologies. Mark will see if he can make it available to us.

A 2nd step following onto this could  be to verify the "same-as" relationships by trying to apply the concepts in all of the reconciled ontologies to instances from the various projects that have  employed those ontologies in their ongoing work.

DataONE perspective:
Data Conservancy perspective:
(need summary here)

NEED A URL FOR THE SURVEY RESULTS AVAILABLE PLEASE

The survey (https://spreadsheets.google.com/viewform?formkey=dEQ4YWZxdmtHSF9OSHRrcUVkUUVoamc6MQ#gid=0) should enumerate the most current URLs for each of the observation models being used in SONet participant domains.  This will be a useful resource for comparisons into the future. i.e:
Mark will be hiring a postdoc that will act on the priorities of the JWG. The postdoc may have more of a CS background, or perhaps an Earth Scientist eager to delve into the KR field.

Deborah would like to see an implementation come out of the SONet work, perhaps in the form of a demo that will provide the basis of a SONet paper.

Shawn: An implementation would need to be grounded in a use case.

Hilmar: A bridging ontology to cross domain ontologies may be a reasonable path

SUMMARY OF A POTENTIAL GROUP EFFORT:

Task: Semantically match OBOE --> O&M in || to VSTO --> O&M at a defined detail level TBD
                                      to DwC  --> O&M
                                      to EQ   --> O&M
                       
Outputs: (a) O+M instance data from participating projects, and (b) a determination as to whether O&M is adequate to the task, and (c) the possibility of putting up SOS service instances for projects that succeed at mapping to O&M

Limiting Focus: The specific concepts that the effort would focus on would be limited by focusing on one or more use cases that include concepts that are required across a common axis of concepts (cross-discipline reearch scenarios).

Candidate Use Cases:
Andy: It's interesting to note that if the OBOE-to-O&M matching is successful: VSTO, DwC, and EQ mapping efforts could opt to map to OBOE instead of directly to O&M.

Mark: The work Huiping has done comparing EQ, O&M, and OBOE may be a starting point, including the conversion among models. 

Deborah: This is a critical component in order to implement a working demo with an output of:
a) O&M instance data
b) determine when O&M might be inadequate

Hilmar: EQ mapping to OWL 2 is in process

Dave: Mapping core Darwin Core fields to ontology concepts would provide a reasonable pathway for crosswalking

An illuminating (albeit contrived) use case may involve a cross-domain data search for catfish occurence in ice-covered lakes with a specific phenotypic characteristic correlated with solar flare activity.

Cross-discipline research scenarios:
Space Weather! http://www.exploratorium.edu/spaceweather/sway.html

Excerpt:

Climate

It is already known that changes in the energy output of the sun can affect the climate here on earth. For example, the sun undergoes an 11-year cycle of activity, also known as the solar cycle. During solar maximum, the peak of the 11-year cycle, the sun shines a tiny bit brighter (up to one half of a percent). Studies of tree-ring thickness show that plant growth follows the ups and downs of the solar cycle. Another example is a historical event called the Maunder Minimum, a 65-year dip in solar activity that caused a period of global cooling on earth in the late seventeenth century. During this time, known as the Little Ice Age, temperatures plunged and the Baltic Sea froze over regularly.

Scientists are speculating that galactic cosmic rays (high-energy particles from outside our solar system) may also affect the earth’s climate. Some think that cosmic rays are involved with cloud formation in our atmosphere because they create ions (charged particles) in our atmosphere; ions act as “seeds” (or nucleation centers) for clouds.

Solar Variation affects on Carbon-14 production in the upper Atmo
http://en.wikipedia.org/wiki/Solar_variation#Carbon-14_production

Catfish have been observed to change behaviours - they jump for example - when there is high geomagnetic activity before earthquakes.  (not sure if they might do the same with the high geomagnetic activity associated with solar flares)
http://books.google.com/books?id=xBGffKNfsq8C&pg=PA138&lpg=PA138&dq=catfish+and+high+geomagnetic+activity&source=bl&ots=5IpH3gLQHV&sig=auciEUBWtg6iejU76VwGaIquQrI&hl=en&ei=DyCuTZfxDorUtQOh_pGSAw&sa=X&oi=book_result&ct=result&resnum=1&ved=0CBQQ6AEwAA#v=onepage&q=catfish%20and%20high%20geomagnetic%20activity&f=false

http://en.wikipedia.org/wiki/Solar_variation#Geomagnetic_effects

The Earth's polar aurorae are visual displays created by interactions between the solar wind, the solar magnetosphere, the Earth's magnetic field, and the Earth's atmosphere. Variations in any of these affect aurora displays.
Sudden changes can cause the intense disturbances in the Earth's magnetic fields which are called geomagnetic storms.

...geomagnetic storms could theoretically disturb catfish, causing them to jump.

more on unusual animal behaviour and solar flares and geomagnetic activity.
http://hubpages.com/hub/Can-we-predict-Earthquakes-Unusual-Animal-and-Ocean-life-Clouds

DAY 3 - Wednesday April 20

Discussion on How Best to Collaborate as a group on Ontology Development
Resources:
Serialization Approaches for transfer formats
---------------------------------------------
Philip: A discussion needs to happen with OGC to make then aware of the issues described here.

Ruth: HDF-Mapping (an XML constrct) is being used to annotate HDF4 (and HDF5 in the future) to be able to read the internal physical data structure of the binary files without needing a library. NASA is in the process of using XML-Mapping to describe their entire data archive.

Stephan's three options for OBOE serialization targets:
    Serialization of observational Model
    Serialization of Semantic Data Model (how to map a variable to a semantic concept, eg. CF)
    Serializations the data themselves
    
Activity: Corinna will coordinate with others in the group to pull together the data for the combined use case described above (Catfish/Sun/Ice/Phenotypes/Lakes) ...

Deborah sent out a message to the group on the W3C provenance working group announcement.  Contact her if you have questions and/or interest.

THU:  Deborah, Margaret, Ben, Mark discuss SBC Use Cases

Don't know when both phosphorus and nitrogen collected in same sample.  But cannot legitimately calculate a ratio of these, unless, in same sample, and even then may be further restrictions on validity.  Can use OBOE Context to determine whether same instance...and thus if candidate for creating a ratio.

Also have allometric data set.  Use MathML for expressing necessary processing of result-set.

Pulling items out of SWEET-- which super and sub-classes useful.

on https://sonet.ecoinformatics.org/observational-data-use-cases/use-case-1b-nitrogen-use

Ben: materialize some annotated packages, test use case queries, identify gaps/successes with respect to OBOE model and its ability to answer query conditions. Special focus on "same instance as" feature of the observaitonal model.