Discussion Notes for DUG - 2012-07-16 ===================================== References ---------- DataONE search index term documentation: Add any new formats that should be supported with reference and brief description. Current Proposed Metadata Formats ( http://mule1.dataone.org/ArchitectureDocs-current/design/WhatIsData.html ): Dublin Core ~~~~~~~~~~~ - http://dublincore.org/documents/dces/ The Dublin Core Metadata Element Set is a vocabulary of fifteen properties for use in resource description. Darwin Core ~~~~~~~~~~~ - http://rs.tdwg.org/dwc/index.htm The Darwin Core is body of standards. It includes a glossary of terms (in other contexts these might be called properties, elements, fields, columns, attributes, or concepts) intended to facilitate the sharing of information about biological diversity by providing reference definitions, examples, and commentaries. The Darwin Core is primarily based on taxa, their occurrence in nature as documented by observations, specimens, and samples, and related information. Included are documents describing how these terms are managed, how the set of terms can be extended for new purposes, and how the terms can be used. The Simple Darwin Core [SIMPLEDWC] is a specification for one particular way to use the terms - to share data about taxa and their occurrences in a simply structured way - and is probably what is meant if someone suggests to "format your data according to the Darwin Core". EML ~~~ - http://knb.ecoinformatics.org/software/eml The Ecological Metadata Language (EML) is a metadata specification developed by the ecology discipline and for the ecology discipline. It is based on prior work done by the Ecological Society of America and associated efforts (Michener et al., 1997, Ecological Applications). EML is implemented as a series of XML document types that can by used in a modular and extensible manner to document ecological data. Each EML module is designed to describe one logical part of the total metadata that should be included with any ecological dataset. FGDC CSDGM ~~~~~~~~~~ - http://www.fgdc.gov/metadata/geospatial-metadata-standards The Content Standard for Digital Geospatial Metadata (CSDGM), Vers. 2 (FGDC-STD-001-1998) is the US Federal Metadata standard. The Federal Geographic Data Committee (FGDC) originally adopted the CSDGM in 1994 and revised it in 1998. According to Executive Order 12096 all Federal agencies are ordered to use this standard to document geospatial data created as of January, 1995. The standard is often referred to as the FGDC Metadata Standard and has been implemented beyond the federal level with State and local governments adopting the metadata standard as well. GCMD DIF ~~~~~~~~ - http://gcmd.nasa.gov/User/difguide/difman.html The DIF does not compete with other metadata standards. It is simply the "container" for the metadata elements that are maintained in the IDN database, where validation for mandatory fields, keywords, personnel, etc. takes place. The DIF is used to create directory entries which describe a group of data. A DIF consists of a collection of fields which detail specific information about the data. Eight fields are required in the DIF; the others expand upon and clarify the information. Some of the fields are text fields, others require the use of controlled keywords (sometimes known as "valids"). The DIF allows users of data to understand the contents of a data set and contains those fields which are necessary for users to decide whether a particular data set would be useful for their needs. - Mapping to DC available at http://gcmd.nasa.gov/Aboutus/standards/dublin_to_dif.html ISO 19137 ~~~~~~~~~ http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=32555 ISO 19137:2007 defines a core profile of the spatial schema specified in ISO 19107 that specifies, in accordance with ISO 19106, a minimal set of geometric elements necessary for the efficient creation of application schemata. It supports many of the spatial data formats and description languages already developed and in broad use within several nations or liaison organizations. NEXML ~~~~~ http://nexml.org The NEXUS file format is a commonly used format for phylogenetic data. Unfortunately, over time, the format has become overloaded - which has caused various problems. Meanwhile, new technologies around the XML standard have emerged. These technologies have the potential to greatly simplify, and improve robustness, in the processing of phylogenetic data. Water ML ~~~~~~~~ http://his.cuahsi.org/wofws.html The Water Markup Language (WaterML) specification defines an information exchange schema, which has been used in water data services within the Hydrologic Information System (HIS) project supported by the U.S. National Science Foundation, and has been adopted by several federal agencies as a format for serving hydrologic data. The goal of WaterML was to encode the semantics of hydrologic observation discovery and retrieval and implement water data services in a way that is both generic and unambiguous across different data providers, thus creating the least barriers for adoption by the hydrologic research community. Genbank internal format ~~~~~~~~~~~~~~~~~~~~~~~ http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html ISO 19115 ~~~~~~~~~ - http://en.wikipedia.org/wiki/ISO_19115 ISO 19115 "Geographic Information - Metadata" is a standard of the International Organization for Standardization (ISO). It is a component of the series of ISO 191xx standards for Geospatial metadata. ISO 19115 defines how to describe geographical information and associated services, including contents, spatial-temporal purchases, data quality, access and rights to use. The standard defines more than 400 metadata elements and 20 core elements. - NA profile - bio profile - marine community metadata profile - WMO profile Dryad Metadata Profile ~~~~~~~~~~~~~~~~~~~~~~~~~ https://www.nescent.org/wg_dryad/Metadata_Profile The Dryad metadata team has developed a metadata application profile based on the Dublin Core Metadata Initiative Abstract Model (DCAM) following the Dublin Core guidelines for application profiles. The Dryad metadata profile is being developed to conform to the Dublin Core Singapore Framework, a framework aligning with Semantic Web development and deployment. ADN ~~~ - http://www.dlese.org/Metadata/adn-item/ The purpose of the ADN (ADEPT/DLESE/NASA) metadata framework is to describe resources typically used in learning environments (e.g. classroom activities, lesson plans, modules, visualizations, some datasets) for discovery by the Earth system education community. GML Profiles ~~~~~~~~~~~~ - http://en.wikipedia.org/wiki/Geography_Markup_Language#Profile GML profiles are logical restrictions to GML, and may be expressed by a document, an XML schema or both. NetCDF-CF-OPeNDAP ~~~~~~~~~~~~~~~~~ - http://opendap.org/ - http://www.oceanobs09.net/work/cwp_proposals/docs/100_Hankin_StandardsOceanDataInteroperability_CWPprop.doc DDI ~~~ - http://www.ddialliance.org/ The Data Documentation Initiative is an international effort to establish a standard for technical documentation describing social science data. A membership-based Alliance is developing the DDI specification, which is written in XML. MAGE ~~~~ - http://www.mged.org/Workgroups/MAGE/mage.html The MicroArray and Gene Expression (MAGE) provides a standard for the representation of microarray expression data that would facilitate the exchange of microarray information between different data systems. ESML ~~~~ - Earth Science Markup Language - http://esml.itsc.uah.edu/ The Earth Science Markup Language (ESML) is a interchange standard that supports the description of both syntactic (structural) and semantic information about Earth science data. Semantic tags provide linking of different domain ontologies to provide a complete machine understandable data description. CSR ~~~ - http://www.oceanteacher.org/oceanteacher/index.php/Cruise_Summary_Report_%28CSR%29 The Cruise Summary Report (CSR), previously known as ROSCOP (Report of Observations/Samples Collected by Oceanographic Programmes), is an established international standard designed to gather information about oceanographic data. ROSCOP was conceived in the late 1960s by the IOC to provide a low level inventory for tracking oceanographic data collected on Research Vessels. The ROSCOP form was extensively revised in 1990, and was re-named CSR (Cruise Summary Report), but the name ROSCOP still persists with many marine scientists. Most marine disciplines are represented in ROSCOP, including physical, chemical, and biological oceanography, fisheries, marine contamination/pollution, and marine meteorology. The ROSCOP database is maintained by ICES MIENS ~~~~~ - Minimum Information about an ENvironmental Sequence (MIENS) - http://gensc.org/gc_wiki/index.php/MIENS - http://precedings.nature.com/documents/5252/version/2 A metadata specification for representing the contextual and environmental information associated with marker gene data sets collected in the environment. The MIENS specification extends the MIGS/MIMS specification. Additional specifications in use by relevant agencies ----------------------------------------------------- ISO 2146 ~~~~~~~~ ISO 2146 (Registry Services for Libraries and Related Organisations) is an international standard currently under development by ISO TC46 SC4 WG7 to operate as a framework for building registry services for libraries and related organizations. It takes the form of an information model that identifies the objects and data elements needed for the collaborative construction of registries of all types. It is not bound to any specific protocol or data schema. The aim is to be as abstract as possible, in order to facilitate a shared understanding of the common processes involved, across multiple communities of practice. Used by the Australian National Data Service (ANDS) for describing data collections in ANDS, which for many Australian data sets corresponds to the concept of a 'data set' used here. The term 'collection' is loosely defined so that different disciplines can apply it appropriately. See: http://www.nla.gov.au/wgroups/ISO2146/ Schema: http://www.nla.gov.au/wgroups/ISO2146/n198.xsd ANZLIC Metadata Profile ~~~~~~~~~~~~~~~~~~~~~~~ A profile of ISO 19115 for Australia. See: http://www.osdm.gov.au/ANZLIC_MetadataProfile_v1-1.pdf?ID=303