Interoperability breakout session from DataONE Users group meeting: 2014-07-07 

Felimon Gayanilo facilitating, Bruce Wilson primary note taker. 

See slides on definition of interoperability.  Proposed four levels based on discovery, access, and use.
* I can see what you have and you can see my collections
* I can get what I need and you can too
* I understand what I got and got what I want
* I can use what I got and you're welcome to use what you got.  

Question: Do MN's think that we have achieved interoperability in the context of DataONE? 
* Do you have the time and resources needed?
* Do you ahve the knowledge of what is needed to implement the required structures and standards?
* Do you agree that interoperabiltiy is needed to move science forward or the implementation of standards inhibitis science?
* Is the culture of data sharing and trust established?
* Do you ndeed DataONE to provide mroe guidance or a wizard to select tech suited?

Brian Wee: Technical interoperability is one aspect.  Via Mark Parson: interoperability is also about use.  Is there enough information to assess scientific equivalence of different data sets.  Scientific interoperability doesn't always map well to technical interoperabilty.  Example of a time series with some clipping, but end user wants the unusual points.  Can you tell from the data and its documentation that the data were clipped?  Can you get access the unclipped data.  If one dataset is clipped and another is not, can you tell this information and understand how the data can and cannot be compared?   BEW: scientific interoperability is an element of the third bullet on interoperabilty is -- I understand what I got and got what I want.  

Jim: How expand interop beyond DataONE?  If pull data from DataONE into a SEED project space, pull all of the metadata.  If do the bookmark from DataONE, get the bytes, but not the metadata.  Download link doesn't have the metadata.  Would like a common way to get to the metadata given an ID.  Right now, feels that has to download the metadata in order to be able to retain access.  Has pushed in the RDA the idea of a UPC symbol for data.  Can we standardize for that concept.  BEW: I think the issue is that given the ID for data (the D1 get call), how can I get to the metadata for those bytes?

Steve: Interop is almost an acceptacnce thing, sort of a cultural or social thing (but that carries connotations).  If doing work cor a sicenc climate center, working with two scientists, trust is an important element of that relationship.  That aspect is not captured in the technical discussion.  Jim: If I know who created it, I may not need as much of the provenance data, so the point is related.  Knowing someone is a shortcut to getting to the provenance.  But a goal of interop is to be able to handle and allow for these kinds of shortcuts.  

Greg: Can think about interop in a couple of different ways. Can think about homogeneous data ttypes, but theres  a strong desire to work with hetero data types across discilines.  then get into types of tools and models to communicate across data types.  Whole different component of infra to deal with this kind of interoperabilty across hetero datasets.  

Kevin: One way we get data into DataONE is by collecting data through an application.  Port that data.  There are some basic tools to get from one forest of data into DataONE.  These tools are necessary and need to understand how these are changing and need to change.  Also need support (community or otherwise) to handle when cases don't work.  Dave: D1 MN's expose a common API.  To add content to any MN, need to be able to work to that API.  Provides a layer of abstraction so that any tool that speaks that API can work.  Tools being used are not specific to KNB, could work with any MN.  Chris: important for community members to be involved with the development of ITK.  

BEW: At the API layer of interoperability, what are the means where D1 API can be made interoperable with other standards.  D1 enabling of repository tools like GeoNetwork, DSpace, DAP-compliant tools are one way to do this.  

Felimon: Community can come up with a wish list or features you want added, so those can get added.  Dave: Jim mentioned one need, which is a way to find the related information.  That's a clear tool need.  Jim: Zotero stuff was cool.  How can I get the citation information if have the download link.

John K: We have not achieved interoperability.  It's a vast landscape, won't make headway of working across the whole thing.  One test would be interop with SEAD, that would be a big step.  It's low level and an enabler of many other things.  

Regan: other forms of interoperabilty.  Might want to manage the interaction.  Some form of management that goes beyond simple interoperabilty.  

Jim: Tool registry is an aspect of this that that is important.  SEAD has the concept of an extractor.  Right now using an OSS tool.  Right now there's no registry or message bus to get things plugged into different systems.  Interop is also more about what can I do with the data, more than just moving it around.  Dave: At least an aspect of this is in scope for what we're planning to do for phase II, particularly int he context of registration of services.  This can be subsetting, rendering, etc.  Challenge is understanding the services, describe the services in ways clients and people can understand what the service can do and what kinds of data it works on (both types and the specific data which are accessible to a specific service).  Some, like OGC are easier to describe.  Others like very custom tools (BEW: like the ones at MPC that work with complex data), are much harder to describe.  Start with the ones that are easier to describe.  BEW: difference between a service and a tool.

Mike F: relates to work that they're doing with RPI, especially where USGS just has the metadata and not the data.  Needs to understand what services and what tools can be used for different types of data.  Jim: If I have a certain type of file, what tools can work with it.  But can we go further.  If I can get the data and the metadata in addition to the data, then that opens up different capabilities, particularly in terms of being able to get that provance information out to other systems for follow-on use.  Standardization is key to help drive the tool community to work with these standards.  Mike: this will improve the downstream metadata.  If can do 3-4 good examples, that can drive the rest of what DataONE and partners do.   Brian: from RDA if have a canonical representation of types, then I know what my tools can work with.  Jim: MIME isn't sufficient, need a more complex representation. binary/xlsx tells you something about what tools can open it, but not that it's a time series data, nor anything about the structure of the data within that tool.  

Greg: How much does DataONE see itself longer term in the tools and services development business?  Dave: That's an expensive business.  About all we can achieve is that there's a solid, reliable CI that tool developers can achieve.  Can help some with bootstrapping tools, but need to engage the community.  BEW: catalyst analogy -- lower the activation energy but isn't consumed in creating the outcome.  

Karl -- difference between standards and tools is important, but the idea of leveraging existing tools is critical.  OGR and GDAL are key tools that can work with OGC services.  If there's a way for those to be interoperable with DataONE, using the existing standard services, then that is a relatively low barrier way to achieving enabling of tools and interop.

Lynn: Tech is changing fast.  Challenge she sees is that to do interop there is a lot of people to people communication.  Where are the places where that communication can be facilitated.  Is there somehting that DataONE can do to pull together the standards and practices, for example, for metadata. How can we reduce the number of phone calls and meetings needed to achieve inteorp between systems and across a community?  Lynn: don't get too far ahead of things.  Some of the questions on the slides represent relatively advanced use cases.  But there's a lot of much mroe mundane interoperabilty that's needed now.  How can we get that basic stuff done and not get too distracted by the really cool, advanced cases.

Wendy: If looking at adding other discipines and other global regions, have issues of how to search, what metadata do you make available?  Different disciplines create different systems is because they understand how the users want to search for and use that data.  Simplification creates a form of interoperabiltiy, but may not enable that reuse because of potentially subtle issues that are domain-specific.  As you move up to interop across more disciplines, then the people working in those disciplines lose the tools that help them work in the ways that are specific to that community of practice.  

Margaret: Would tweak the two interop bullets.  Would take the semantics off the second bullet and put it on the third and move the preservation from the third and put it on the second.   (c.f. first set of bullets in this document).  Key point is that if users don't understand what they're looking at, they'll go away.  

Tracy: building on Wendy's comments.  It's not so much that people are particularly wedded to tools, but the language is a challenge.  An ecologist won't necessarily understand the social sciences data in the ways that the social scientist does. Jim: Building on this, need to enable people to use the tools that are specific to thier community of practice as well as being able to work across disciplines.  Wendy: is there an abstract model which is common. BEW: also importnant that if working in a hub and spoke model with that common core, that we preserve the information which is domain specific.  

Steve: Relates to his earlier point. Interoperabilty is often about a phone call and people-to-people communication.  day to day science is down in the trenches, but technology is often not a substitute for interactions.  And hard to get people to change how they do things -- scientists are often relatively resistant to change in operating practices.  BEW: and we learn how to work from the community around us and from our professors and older grad students.  

Felimon: Are we there yet?  Lynn: There is a continuum of interop.  DataONE has set the stage for a lot of forms of interop.  There's never a "there" -- it's an asymptote.  Jim: As long as people are using file systems, where metadata is disconnected, we're stuck.  Now we're heading to cases where the metadata can be associated with the data.  That's a key enabler.  Need to get out of the file system mentality.  BEW: though file is often the atomic unit of data.  Andrew: Curious about how to measure interoperabiltiy.  Lots of ways, but which are we using? Or are we using any.  If you don't know where you're going, you just might get there.  Mike: wouldn't measure inteorp directly.  Would measure outcomes.  Interop is a means, not an end.  Wendy: Different aspects of interop.  Beinga ble to define it is important.  Need this to assess gaps.  Yes, it's a means, but the end is also vague and amorphous, so understanding the feeds to end results is also important.  May be able to find indicators that show inteorp as an enabling tech.  

Karl: an approach to consider: Able to search d1 holdings for things that have data attached.  If add options for searching inventory of MN assets to assess what additional info is needed to make data actionable.  Gives a means to assess improvement of interop as a fn of time.  Have a decent handle on interop for discovery. Long way to go for understanding and composibiliy of data.  How identify tech methods to assess readiness for reusability, provides a metric for interop and means for community.  Existing R libraries can work with some data that are relativley well self-documenting.  Greg: Expects that there will be web services that are good meaures of interop.  Karl: it's not a perfect assessment, but rather a relatively narrrow defn.  Can also look at the number of workflows that demonstrate interop.  Won't be many, but could be relatively higher impact.  Greg: Get asked "what new science enabled" -- tough question.  Steve: if it's a tightly constrained defn of interop and can build on it, then have a good chance of not getting too far ahead of ourselves (Lynn's point from earlier). Will be great to show stories of where DataONE has been used to do science.  Karl: continuous integration process also allows some filtering of how data are used from DataONE.  Lynn: How can we highlight people who are doing this well, show off exemplars.  Jim: Provenance stuff helps here.  Can think of low level metrics, analogies to h-index.  Provenance -- number of datasets with a specific provenance side.