Discussion Notes for DUG - 2012-07-16
=====================================


References
----------


DataONE search index term documentation:





Add any new formats that should be supported with reference and brief description.




Current Proposed Metadata Formats ( http://mule1.dataone.org/ArchitectureDocs-current/design/WhatIsData.html ):

Dublin Core
~~~~~~~~~~~

- http://dublincore.org/documents/dces/

The Dublin Core Metadata Element Set is a vocabulary of fifteen properties for
use in resource description.


Darwin Core
~~~~~~~~~~~

- http://rs.tdwg.org/dwc/index.htm 

The Darwin Core is body of standards. It includes a glossary of terms (in
other contexts these might be called properties, elements, fields, columns,
attributes, or concepts) intended to facilitate the sharing of information
about biological diversity by providing reference definitions, examples, and
commentaries. The Darwin Core is primarily based on taxa, their occurrence in
nature as documented by observations, specimens, and samples, and related
information. Included are documents describing how these terms are managed,
how the set of terms can be extended for new purposes, and how the terms can
be used. The Simple Darwin Core [SIMPLEDWC] is a specification for one
particular way to use the terms - to share data about taxa and their
occurrences in a simply structured way - and is probably what is meant if
someone suggests to "format your data according to the Darwin Core".


EML
~~~

- http://knb.ecoinformatics.org/software/eml

The Ecological Metadata Language (EML) is a metadata specification developed
by the ecology discipline and for the ecology discipline. It is based on prior
work done by the Ecological Society of America and associated efforts
(Michener et al., 1997, Ecological Applications). EML is implemented as a
series of XML document types that can by used in a modular and extensible
manner to document ecological data. Each EML module is designed to describe
one logical part of the total metadata that should be included with any
ecological dataset.


FGDC CSDGM
~~~~~~~~~~

- http://www.fgdc.gov/metadata/geospatial-metadata-standards

The Content Standard for Digital Geospatial Metadata (CSDGM), Vers. 2
(FGDC-STD-001-1998) is the US Federal Metadata standard. The Federal
Geographic Data Committee (FGDC) originally adopted the CSDGM in 1994 and
revised it in 1998. According to Executive Order 12096 all Federal agencies
are ordered to use this standard to document geospatial data created as of
January, 1995. The standard is often referred to as the FGDC Metadata Standard
and has been implemented beyond the federal level with State and local
governments adopting the metadata standard as well.


GCMD DIF
~~~~~~~~

- http://gcmd.nasa.gov/User/difguide/difman.html

The DIF does not compete with other metadata standards. It is simply the
"container" for the metadata elements that are maintained in the IDN database,
where validation for mandatory fields, keywords, personnel, etc. takes place.

The DIF is used to create directory entries which describe a group of data. A
DIF consists of a collection of fields which detail specific information about
the data. Eight fields are required in the DIF; the others expand upon and
clarify the information. Some of the fields are text fields, others require
the use of controlled keywords (sometimes known as "valids").

The DIF allows users of data to understand the contents of a data set and
contains those fields which are necessary for users to decide whether a
particular data set would be useful for their needs.

- Mapping to DC available at http://gcmd.nasa.gov/Aboutus/standards/dublin_to_dif.html


ISO 19137
~~~~~~~~~

http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=32555

ISO 19137:2007 defines a core profile of the spatial schema specified in ISO
19107 that specifies, in accordance with ISO 19106, a minimal set of geometric
elements necessary for the efficient creation of application schemata.

It supports many of the spatial data formats and description languages already
developed and in broad use within several nations or liaison organizations.


NEXML
~~~~~

http://nexml.org

The NEXUS file format is a commonly used format for phylogenetic data.
Unfortunately, over time, the format has become overloaded - which has caused
various problems. Meanwhile, new technologies around the XML standard have
emerged. These technologies have the potential to greatly simplify, and
improve robustness, in the processing of phylogenetic data.



Water ML
~~~~~~~~

http://his.cuahsi.org/wofws.html

The Water Markup Language (WaterML) specification defines an information
exchange schema, which has been used in water data services within the
Hydrologic Information System (HIS) project supported by the U.S. National
Science Foundation, and has been adopted by several federal agencies as a
format for serving hydrologic data. The goal of WaterML was to encode the
semantics of hydrologic observation discovery and retrieval and implement
water data services in a way that is both generic and unambiguous across
different data providers, thus creating the least barriers for adoption by the
hydrologic research community.

Genbank internal format
~~~~~~~~~~~~~~~~~~~~~~~

http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html



ISO 19115
~~~~~~~~~

- http://en.wikipedia.org/wiki/ISO_19115

ISO 19115 "Geographic Information - Metadata" is a standard of the
International Organization for Standardization (ISO). It is a component of the
series of ISO 191xx standards for Geospatial metadata. ISO 19115 defines how
to describe geographical information and associated services, including
contents, spatial-temporal purchases, data quality, access and rights to use.
The standard defines more than 400 metadata elements and 20 core elements.

- NA profile
- bio profile
- marine community metadata profile
- WMO profile



Dryad Metadata Profile
~~~~~~~~~~~~~~~~~~~~~~~~~

https://www.nescent.org/wg_dryad/Metadata_Profile

The Dryad metadata team has developed a metadata application profile based on
the Dublin Core Metadata Initiative Abstract Model (DCAM) following the Dublin
Core guidelines for application profiles. The Dryad metadata profile is being
developed to conform to the Dublin Core Singapore Framework, a framework
aligning with Semantic Web development and deployment.



ADN
~~~

- http://www.dlese.org/Metadata/adn-item/

The purpose of the ADN (ADEPT/DLESE/NASA) metadata framework is to describe
resources typically used in learning environments (e.g. classroom activities,
lesson plans, modules, visualizations, some datasets) for discovery by the
Earth system education community.



GML Profiles
~~~~~~~~~~~~

- http://en.wikipedia.org/wiki/Geography_Markup_Language#Profile

GML profiles are logical restrictions to GML, and may be expressed by a
document, an XML schema or both.



NetCDF-CF-OPeNDAP
~~~~~~~~~~~~~~~~~

- http://opendap.org/

- http://www.oceanobs09.net/work/cwp_proposals/docs/100_Hankin_StandardsOceanDataInteroperability_CWPprop.doc




DDI
~~~

- http://www.ddialliance.org/

The Data Documentation Initiative is an international effort to establish a
standard for technical documentation describing social science data. A
membership-based Alliance is developing the DDI specification, which is
written in XML.



MAGE
~~~~

- http://www.mged.org/Workgroups/MAGE/mage.html

The MicroArray and Gene Expression (MAGE) provides a standard for the
representation of microarray expression data that would facilitate the
exchange of microarray information between different data systems.



ESML
~~~~

- Earth Science Markup Language

- http://esml.itsc.uah.edu/

The Earth Science Markup Language (ESML) is a interchange standard that
supports the description of both syntactic (structural) and semantic
information about Earth science data. Semantic tags provide linking of
different domain ontologies to provide a complete machine understandable data
description.



CSR
~~~

- http://www.oceanteacher.org/oceanteacher/index.php/Cruise_Summary_Report_%28CSR%29

The Cruise Summary Report (CSR), previously known as ROSCOP (Report of
Observations/Samples Collected by Oceanographic Programmes), is an established
international standard designed to gather information about oceanographic
data. ROSCOP was conceived in the late 1960s by the IOC to provide a low level
inventory for tracking oceanographic data collected on Research Vessels.

The ROSCOP form was extensively revised in 1990, and was re-named CSR (Cruise
Summary Report), but the name ROSCOP still persists with many marine
scientists. Most marine disciplines are represented in ROSCOP, including
physical, chemical, and biological oceanography, fisheries, marine
contamination/pollution, and marine meteorology. The ROSCOP database is
maintained by ICES

MIENS
~~~~~

- Minimum Information about an ENvironmental Sequence (MIENS)

- http://gensc.org/gc_wiki/index.php/MIENS

- http://precedings.nature.com/documents/5252/version/2

A metadata specification for representing the contextual and environmental information 
associated with marker gene data sets collected in the environment.  The MIENS specification 
extends the MIGS/MIMS specification.

Additional specifications in use by relevant agencies
-----------------------------------------------------

ISO 2146
~~~~~~~~

ISO 2146 (Registry Services for Libraries and Related Organisations) is an
international standard currently under development by ISO TC46 SC4 WG7 to
operate as a framework for building registry services for libraries and
related organizations. It takes the form of an information model that
identifies the objects and data elements needed for the collaborative
construction of registries of all types. It is not bound to any specific
protocol or data schema. The aim is to be as abstract as possible, in order to
facilitate a shared understanding of the common processes involved, across
multiple communities of practice.

Used by the Australian National Data Service (ANDS) for 
describing data collections in ANDS, which for many Australian data sets
corresponds to the concept of a 'data set' used here. The term 'collection' 
is loosely defined so that different disciplines can apply it appropriately.

See: http://www.nla.gov.au/wgroups/ISO2146/
Schema: http://www.nla.gov.au/wgroups/ISO2146/n198.xsd

ANZLIC Metadata Profile
~~~~~~~~~~~~~~~~~~~~~~~
A profile of ISO 19115 for Australia.  See:
http://www.osdm.gov.au/ANZLIC_MetadataProfile_v1-1.pdf?ID=303