#persist Metrics for DataONE PhaseII (Years 5-10) Content and Service Oriented * Uptime of DataONE service availability to target audience. [MBJ: we need to start logging this in a database, indexed by subsystem] * Number of current version data objects (i.e. entities of type "DATA" in the object formatType that are not obsoleted) * Number of current version data packages (i.e. entities of type "RESOURCE" in the object formatType that are not obsoleted). [[This metric provides an indication of the number of aggregated data products available through the system]] * Total size of stored content. * Number [BEW: and Volume] of data downloads with DataONE tools as determined by the User Agent string in the HTTP request header to GET requests. [BEW: Plus downloaded via the search interface, which probably requires getting the referrer for calls at the MN] * Number of searches run against the DataONE search GUI and API * Total number of authenticated users accessing DataONE services (on MNs or CNs) (creating, consuming, manipulating content) [MBJ: note that many Tier 1 MNs upload all of their content via sync rather than via authenticated users, so I'm not sure how meaningful this will be for assessing # of true users) [DV - It shows the number of authenticated users. If the number is very small then difficult to justify expense.] * Number of distinct original data contributors as determined by the subject of the user adding content to the Member Node (i.e. the "submitter" in system metadata) [MBJ: submitter is unlikely to be the best field; many groups use a single submitter even when there are many data contributors; a better index of contributors would be the list of 'Creator' elements from the metadata, which are the people who should be attributed for the contribution][DV - as long as there is a consistent, reliable mechanism that works across MNs and metadata formats] * Number of distinct data consumers as determined by the subject of the authenticated user accessing DataONE content resolution and retrieval services.[MBJ: combine with next one based on IP; also, I propose we should be tracking individually numbers of metadata views, data downloads, and package downloads][DV - Separate metric from total users. views, data, package distinction can be done through logs (now) for services, but not UI (without significant log processing)] * Number of distinct data consumers as determined by the IPAddress of the device accessing DataONE content resolution and retrieval services. [[Note - this is a very sloppy metric and does not reliably indicate distinct users]] * Search precision and recall as determined by repeat evaluation of a corpus of test queries executed against the search system and evaluated by a group of domain experts. * Number of visitors and questions on ask.dataone.org. [MBJ: maybe, but this will remain low until we prominently feature it in our site; let's avoid using metrics for systems we don't plan to invest in heavily] * Number of visitors on our IRC channel. [MBJ: unlikely to ever be very high] * Number of hits for "DataONE" in web search engines. * Integrity of content as determined by the number of "404 - Not Found" responses to get(PID) [MBJ: we should be tracking down and resolving all of these; maybe a splunk alert sent to someone responsible for tracking them down?][DV - add for resolve service as well] Size of the DataONE Federation * Number of Member Nodes participating in the production environment * Number of software solutions implementing the Member Node software stack and proportion of those Member Node software stack installations that have DataONE services enabled and are participating in DataONE * Number of organizations participating in DataONE as Member Nodes, collaborating projects, and resource contributions. Community Impact * Number of publications that cite content accessible through DataONE [[Can be supported by future capability and agreements with publishers]][MBJ: I would rephrase this to number of citations to content available through DataONE, so its not just publications that can do the citing; citing on blog posts and in published analyses may be just or more important over time] * Geographic distribution of users that request searches and downloads.