DataONE Metadata Working Group Notes
September 18, 2012

Attending: John Kunze, Jane Greenberg, (Jane)Jim Regetz, Angela Murillo, Greg Janée, Roger Dahl, Mark Servilla (part-time)

Regrets:  Sarah Callaghan, Tim Robertson, Rob Guralnick (in later), Karthik Ram (in cameo)


Sunset: July 2014
Deliverables:
1. Tech work plan and schedule

Vision
"term dictionary"
One online writeable registry/dictionary with these properties:
Prior Art to Imitate, Learn From, or Adopt:
Personas /Audiences
Use cases
challenges
- terms definition scope change
- we are going for cultural change
- ** need examples
- ** 

StackOverflow Model adoption issues
Requirements/Desiderata
principles
- minimality
- convergence
- simplicity
- community platform for vetting
- reduce artificial duplicate work, not necessary...

Common perceptions of what a registry is: (Lesson: be careful how we talk about it)
Should we avoid use of the word "registry" to describe what we want to build?
Perhaps "dictionary" or "concordance"?
just for refernece point, here is 11179 ISO standard for metadata registires.  http://metadata-standards.org/11179/.  It is complex, but a refernece point for us
High level principles
+ public read
+ general purpose, cross-disciplinary
+ public write
+ any parts of speech
+ terms automatically expire
+ blessed terms expire only with manual process
+ tracking term use to understand acceptance

Agenda:
       
1. Introductions, culture, and logistics   (John)
-- Note taker rotation
  
2. Overriding goals of the meeting (Jane,   John, WG members)
  
4. Review of current task activities
  
-- Functional requirements, including registry review (Jane, WG members)
Functional Requirements for the PAMWG
(see draft paper sent...)

--Technical considerations (John, WG   members)
            
BREAK -- 3:00-3:30 
            
BLOCK 4 -- 3:30-5:00
            
4. Angela Murillo, reporting on DataONE   summer internship (Angela)
  
-- Reporting, Q&A, and discussionReview   of current task activities, continued.
  
5. Discussion of current task activity (WG   members)
  
- progress, challenges, next steps
  
6. Discussion on how current activities   inter-relate to DataONE and impact beyond DataONE (e.g., DAITF, etc.)
  
  
7. Open discussion, reviewing priorities for   Day 2 (WG members)
    

Block after lunch on day 2
- how to deal with the idea that one entry for each term
- refinement one example
- qualifications demonstrated via use
- consider optional qualification
- could we have mutiple qualification... 
- only one level down, call it refinement, not hierarchy...

SKETCHING the system
- the core, we want a represetnation of term editable by a community
- provide structure for term
- we don't want to over structure... we want free space for each field
- how to get the terms in there..
- seed with terms?  should we upload/put in something like the Dublin Core?
- careful, we wouldn't want to offend other communities within DataOne
- start with the unique, the orphan items... 
- forced cross-walking.
- reach out to communities where new things are happening... e.g., MBG.
- ownership (e.g. owning Darwin Core "core" w/in), how can we promote co-ownership
- distributed viersion control
- people could tag ... 
- rfc, if you must "renew", update ...verify identifier
- w/term-identifier, mutiple and yet persistence, potential for conflict
- stack-overflow, things converge to an answer, Nassib: converge to an attribute that people have consensus.
- when it's tagged it gets the uri
- From Nassib: 
    1. Community Seeded ---> Vo [MD] = Tag To ---> V1 [MD] 
    2. Proposed ---> Vo [MD] ----> V1 [MD] -----> V2 [MD] = Tag To
    - benevolent leaderships (dictatorships?)
- the creator of a new element can always has the authority to tag a version of the element which automatically assigns it a persistent identifier (question from Jim)
 
Notes from Semantics Group Discussion 3:30 - 5:00pm:
- We're talking about properties (predicates) and values
- We want everything open
- LOD would anticipate relationships
- Think of it as a menu 
- Vocabularies that are useful
- check out BioPortal  (http://bioportal.bioontology.org/) for organic ontology http://sswapmeet.sswap.info/jit/
- would want some organization/modularization to find information, some detail, some organization
- will we prepopulate with anything such as sswap, (http://sswap.info/)
- Damien: ontology alignment (this term in this ontology is the same as this term in other), very difficult,

Suggestion:
- significant namespace problem, they have a huge amount of effort, for example the gene ontology, there are areas that wlll be out of bounds
- there are a significant amount of annotation and documented terms that are being used, ex. waterml, SWEET (http://sweet.jpl.nasa.gov/ ) adoped by ESIP (Earth Science Information Partners, http://www.esipfed.org/ 
- establish a clear connection between certain ontologies, if there's clear alignment that there's some value

What is the semantic group working on and where to the intersections lie (question from Rob):
- big vision, help data one with semantics, making the services smarter
- taking a small set of user cases and trying to provide some semantic enablement to make use cases work better
- Hydro-eco use cases: get water data looking at chemical concentrations for populations, how if we looked at onedrive, what would be the starting point
- metadata flattened
- right now people are not getting back the data they want when they search

Action Items:
- They will send us the current use cases
- They asked us if we're going to write documentation for new member nodes

Day Three:
1.Debrief
- Thinking about concrete examples,
Jim: 
concrete example of KNB temperature, tabular data find an attribute name having tempuerature in it.  Many unique definitions, temperature in units, protocol, exx. "temperature in degrees C," "Water Temperatures", "Ambient water temperature".
Terms + qualifiers (refinements)
Greg: What, where, how, what units?
Standard Facets: context meets facets
- if we use something that have standard qualifiers or facets 
- beauty of facets we don't have to dig down as much, facets are mutually exclusive things that you can put together
John:
things that can jump across the equal sign
we can allow a building up
Keywords: context free, they're labels applied

From Greg:
What if the system asks qualifying questions, ex temperature, the system asks "temperature of what"... etc
From Jane: 
Whatever we do is not going to answer every question.
From John:
We are regularly going to visit some of the scary areas.  We want to carve off the easy parts and remember what the hard parts look like and 

Scope:
1. Use cases 
- A user comes in and want to build there other spreadsheets, column heading meetings to build out dataset
2. Requirements

Break Out Session (Jane, Nassib, Rob, Angela)
- A user comes in and want to build there other spreadsheets, column heading meetings to build out dataset
- you want to come to dataONE to make spread sheet most compatible with existing terms/concepts
Alpine Pikas
Measure: temp talus (under) temp talus (surface), slope and aspects of talus (slope and direction), length and width of talus, vegetation (grams and forbes), depth of talus, presence or absence of pikas
first order: temp (inference, parameture, observation, computation, measure), slope, aspect, length, width, depth, talus

** should be able to buy into a term at different levels

Units: 
nanometer, kilometer, etc... at some point they have kilometer per second, etc.
per is a term and a part of speech a conjunction, 
temperature: The degree or intensity of heat present in a substance or object, esp.  as expressed according to a comparative scale (definition from google).

Stand alone terms might be the easiest ones
- the low hanging fruit 
- we might want to build one level out or give some guidance

We're going for universiality and the unique one-offs (ex. status)

Are scientists really using metadata and metadata standards?
- comes down to "it depends"

What level of interoperability do we want to provide?

Simple usecases:
- I'm creating metadata and I want to share with my colleagues
- xyzt: biological object are located somewhere (suggestion of a usecase)

Benefits:
1. reducing duplication or redundency
2. helping people not recreate even complicated vocabularies or phrases 

Ex. if we could query temp_under_talus, it would be good if we could query to that level

Proof of concept: 

Day 3: 1-3pm
From Greg:
- Not always caled sandards, 
- example like temperature is an attribute
- dictionary we're proposing might be able to focus on a dictionary of attribute terms a quasi-ontology
- might be beneficial down the road once we got this populated
- focusing on attributes a way to dodge 
- opportunity because they're not being defined by the metadata standard
** looking at stuff that's kind of scary like attributes, so maybe this is an opportunity these lables that go in attributes are a little bit like orphans for example EML/Open data doesn't define the attributes 
- this is an interesting opportunity because it reduces the burden
- we're providing framework and software infrastructure to apply framework
- recall the lsit of CF term is an attempt, but its a big long list kind of fixed
- we're proposing a collaborative community driven model

From Jim:
- through community discussion, we can find levels of consensus

From John:
- we probably want to pick a term 

From this meeting:
Priority Use Case:
Priority Audience

- if you can provide a user or provider with some way of quantifying effort it is useful

Before we meet next time:
- A visualization
- hacking something together

Some participation models:
Encyclopedia of Life: http://eol.org/
- it has some participation items


List from Rob:
http://community.gbif.org/pg/photos/view/21610/gbif-vocabulary-server-scratchpads-drupal
http://vocabularies.gbif.org/
http://www.hyam.net/blog/archives/643
http://scratchpads.eu/

Platform choices to hack a demo
- Stackoverflow, drupal

What to try demo'ing
- actomic terms
- ability to vote

audience/user cases (digging deeper)
- Sally scientists entering column headings, wondering what labels to use for sharing with colleague
- Charlie curator is a metadata expert interersted in the dictionary and interested in improving it in the same way someone who is interested in wikipedia, for improving the dictionary
- Charlie curator helping the client (more of an expert use)
- Sheldon the scientist : a data collection for a long term study
- Doug wants to resuse data
- Any agent extends a scheme or has a new need not in existing scheme
- address metadata boundary cases: support better articulation, mediation

- Could have a long term impact in the way people communicate about terms

For DataONE:
- it becomes a community resource
- help set a dataONE best practice to integrate into the data lifecycle
- can use the use cases to show specific impact to DataONE
- have use cases be part of data lifecycle


Action Items:
Rob & Angela: Drupal + curatorial module?

Wrap up/Hack #1
- We will use the Sally 
- Phone call in about a month and 
Platform choices to hack a demo
- Stackoverflow, drupal
What to try demo'ing (What more do we want to demo)
- actomic terms
- ability to vote
- label (terms)
- Unique ID
- Definition
audience/user cases (digging deeper)
- Sally scientists entering column headings, wondering what labels to use for sharing with colleague
Action Items:
Rob & Angela: Drupal + curatorial module?
Jim: volunteering to populate "it"
Jane and Angela: get some students to test the system
Nassib: will look at stack overflow and open source versions, and BioPortal, OBOE
Greg: look at the structure of the definitions and a few proposed terms
Jane:


John: Will do doodle for week of Oct 22, for monthly meeting