Lowering the threshold for MN implementation This is a suggestion for a set of changes to the DataONE architecture that would lower the complexity of MNs; which would then lower the threshold for third parties to become MNs; which would then help in maximizing adoption of DataONE. The changes would also create an architecture that would be more intuitive both for developers implementing clients and for end users. Lowering complexity In the current DataONE architecture, the REST interface that must be implemented by MNs is designed to support two separate types of connections: CNs performing synchronization and clients creating and retrieving science objects and metadata (such as logs). The suggestion is to remove some of the functionality in the REST interface that relates to connections from clients. Specifically, to remove the functionality that allows clients to search for new content directly on MNs, read system metadata, read statistics and read logs from MNs. Seen from the point of view of the end user, this does not remove any functionality from DataONE as all of the functionality is also available on CNs. A more intuitive architecture These changes would lead to an architecture that would have less potential for causing confusion among developers and end users. The reason is that, for some tasks in the current architecture, a client may accomplish roughly the same thing in two different ways (by querying a CN or querying an MN), but the results from each may be subtly different. This requires end users to evaluate the information the client is providing in the context of how the information was obtained. REST interfaces What follows is an overview of the affected REST calls, specifically what the suggested changes are and what the effects of the changes would be. MNRead.listObjects() Limiting listObjects() to be used only by CNs for synchronization. The current architecture focuses on providing search functionality through CNs, but provides a second avenue for clients to discover objects, which is to run queries directly against MNs, using the listObjects() interface. CNs use this interface for synchronization. When used by CNs, the interface should yield a complete list of objects that are new or have had their System Metadata updated since the last sync. The implementation for that functionality should be relatively simple for most MNs. However, when this interface is accessed by clients, it must provide a list that is filtered according to the permissions of the DataONE subject running the client. To perform this filtering, MNs must implement fast systems to determine how each of their science objects are affected by the DataONE permissions of the subject. As an example, an MN may contain millions of objects, of which some fraction should be visible to the subject who makes the call. The MN has to instantly find out how many of its objects match the subject’s permissions so that it can provide a correct value for the “total” value in the returned ObjectList. The total designates the total number of objects on the MN that are visible to the subject. When the client starts retrieving sliced ObjectLists, the MN must repeatedly evaluate the permissions of the subject against the permissions on the science objects to return a correctly sliced list that contains correctly filtered objects. Creating this functionality and having it be fast enough for real time use by a client may be a challenge for many MNs. Limiting listObjects() would remove the need for this functionality on MNs. Because there are few filtering options on the listObjects() interface, it will be hard for clients to narrow down the search to the objects of interest. Clients could instead connect to a CN and perform the search there, with a much richer search interface. If a client wishes to find objects that reside on a given member node, it can simply add that as a filtering criteria on a CN based search. By limiting the interface to CNs, some of the other filtering capabilities in the interface can be removed as well, further lowering the MN implementation threshold. Removing the option of accomplishing essentially the same thing in two different ways would make the architecture more streamlined and potentially less confusing for developers and end users. MNCore.getLogRecords() Limiting getLogRecords() to be used only by CNs for synchronization. The same issue with filtering by subject permissions on MNRead.listObjects() apply to this interface. However, filtering on this interface may be even more challenging because there are potentially many log records associated with each object. As with MNRead.listObjects(), the signature on this call can also be simplified if it is limited for use only by CNs. Log records obtained via the MN interface include only events on the object held by that MN and exclude events related to replicas of the object on other MNs. To obtain a complete set of events related to a given object, the client would have obtain a current list of replicas for the object and would then have to connect to each of the MNs that has the object, retrieve the events, and aggregate them. It would be more logical for the client to make a single call to a CN to retrieve the complete set of events for a given object. To provide this interface for use by clients, MNs must maintain log records for all time. If the interface is limited to use by CNs, MNs could store only the events that have occured since the last sync. And, since MNs would not have to run queries against the logs, the MN could chose to store the events in a more opaque fashion in a kind of buffer instead of indexed in a database. MNCore.getObjectStatistics() Removing getObjectStatistics(). This interface allows a monitoring service to create statistics that show changes in number of objects over time. It should be simple for MNs to support it, but CNs have all the information required for generating this statistic, so the interface could be removed from MNs. Also, while the interface is accessible to end users, it is not really of use to them. The interface provides information that relates to the total number of objects on the server, while end users would be interested in objects that are actually available to them. MNCore.getOperationStatistics() CNs would be able to supply this information as well. It would be derived from the logs. MNRead.getSystemMetadata() This interface is required for synchronization. However, the interface is also provided for general use by clients, and the functionality required for synchronization is much more limited than the functionality required for use by clients. To support the use of this interface for synchronization, MNs only need to store the “skeleton” System Metadata objects that are provided by clients when objects are created. After synchronization, the skeleton System Metadata XML document is no longer needed by the MN. Some of the information in the skeleton System Metadata object is required by the MN, but the MN must store that information in a database so that fast queries can be performed against it. The remainder of the System Metadata is simply opaque data that MNs must store so that it can be provided through this interface. The System Metadata provided through this interface is not authoritative; the information may not be as complete or as current as the version of the System Metadata that is available from CNs via the equivalent CNRead.getSystemMetadata() interface. So this interface causes System Metadata to have to be considered in the context of its provenance. To keep System Metadata on MNs closer to the state of the authoritative System Metadata that CNs have, an interface would have to be added to allow CNs to push updates to MNs. But MNs do not need the data, so there’s no other reason to push it to MNs than to enable them to provide it through this interface. And the equivalent interface, which provides the authoritative copy, is already available on CNs. So, removing this interface would lower the threshold for MNs, create a more streamlined architecture where there’s only one source for System Metadata and remove a potential source of confusion for end users.