Test: urn:node:mnTestSEAD Prod: urn:node:SEAD http://mule1.dataone.org/ArchitectureDocs-current/apis/Types.html#Types.Node openssl x509 -text -in /path/to/your/urn:node:mnTestSEAD https://cn-stage.test.dataone.org/cn/v1/accounts https://cn-stage.test.dataone.org/portal ======================================================================== IP Based access restriction for production CN access to getLogRecords: # Path needs to be adjusted according to baseURL path of MN # Example show for a baseURL path of "/mn" Order Deny,Allow Deny from all # IP addresses for CNs required since there is no reverse DNS Allow from 128.111.36.80 64.106.40.6 160.36.13.150 # IF https is available: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SUBJECT_VERIFIED = 'verifiedUser' SUBJECT_AUTHENTICATED = 'authenticatedUser' SUBJECT_PUBLIC = 'public' 1. Start with empty list of subjects 2. Add the symbolic subject, "public" 3. Get the DN from the Subject and serialize it to a standardized string. This string is called Subject below. 4. Add Subject 5. If the connection was not made with a certificate, stop here. 6. Add the symbolic subject, "authenticatedUser" 7. If the certificate does not have a SubjectInfo extension, stop here. 8. Find the Person that has a subject that matches the Subject. 9. If the Person has the verified flag set, add the symbolic subject, "verifiedUser" 10. Iterate over the Person's isMemberOf and add those subjects 11. Iterate over the Person's equivalentIdentity and for each of them: - Add its subject - Find the corresponding Person - Iterate over that Person's isMemberOf and add those subjects ~~~~~~~~~~~~~~ Index task processing: 1. the system metadata fields are loaded 2. sub-processors for the ID are run (loading science metadata for example) 3. then things are merged with the values from SOLR so the final merge part - the only values that should override the values extracted in the task should be those related to the resource map entries. Im still not sure what that merge logic is for. Ive backed out my changes. The problem with the multi-values in single value field does not seem related to the merge specifically. Skye - want to hop onto Skype to talk this through? sure! starting skype - Im on skype now public SolrDoc processID(String id, String sysMetaPath, String objectPath) throws IOException, SAXException, ParserConfigurationException, XPathExpressionException, EncoderException { // Load the System Metadata document Document sysMetaDoc = loadDocument(sysMetaPath, input_encoding); if (sysMetaDoc == null) { log.error("Could not load System metadate for ID: " + id); return null; } // Extract the field values from the System Metadata List sysSolrFields = processFields(sysMetaDoc); SolrDoc indexDocument = new SolrDoc(sysSolrFields); Map docs = new HashMap(); docs.put(id, indexDocument); // Determine if subprocessors are available for this ID if (subprocessors != null) { // for each subprocessor loaded from the spring config for (IDocumentSubprocessor subprocessor : subprocessors) { // Does this subprocessor apply? if (subprocessor.canProcess(sysMetaDoc)) { // if so, then extract the additional information from the // document. try { // docObject = the resource map document or science // metadata document. // note that resource map processing touches all objects // referenced by the resource map. Document docObject = loadDocument(objectPath, input_encoding); if (docObject == null) { log.error("Could not load OBJECT file for ID,Path=" + id + ", " + objectPath); } else { docs = subprocessor.processDocument(id, docs, docObject); } } catch (Exception e) { log.error(e.getStackTrace().toString()); } } } } // ? why is this here // docs.put(id, indexDocument); // TODO: get list of unmerged documents and do single http request for // all // unmerged documents for (SolrDoc mergeDoc : docs.values()) { if (!mergeDoc.isMerged()) { mergeWithIndexedDocument(mergeDoc); } } SolrElementAdd addCommand = getAddCommand(new ArrayList(docs.values())); if (log.isTraceEnabled()) { ByteArrayOutputStream baos = new ByteArrayOutputStream(); addCommand.serialize(baos, "UTF-8"); log.trace(baos.toString()); // System.out.println(baos.toString()); } sendCommand(addCommand); if (docs.size() > 0) docs.clear(); return indexDocument; } ====================================================================== notes from 2011-10-11 ??? ornldaac_mn ORNL DAAC Mercury Member Node http://mercury.ornl.gov/ornldaac/mn/ 1900-01-01T00:00:00.000+00:00 1900-01-01T00:00:00.000+00:00 Questions - is HTTPS required? If authorized access is ever required then HTTPS will be necessary. ===== Content in this pad is temporary only - Use it like a scratch pad. gmn=> explain analyze SELECT "mn_object"."id", "mn_object"."pid", "mn_object"."url", "mn_object"."format_id", "mn_object"."checksum", "mn_object"."checksum_algorithm_id", "mn_object"."mtime", "mn_object"."db_mtime", "mn_object"."size", "mn_object_format"."id", "mn_object_format"."format_id", "mn_object_format"."format_name", "mn_object_format"."sci_meta", "mn_checksum_algorithm"."id", "mn_checksum_algorithm"."checksum_algorithm" FROM "mn_object" INNER JOIN "mn_object_format" ON ("mn_object"."format_id" = "mn_object_format"."id") INNER JOIN "mn_checksum_algorithm" ON ("mn_object"."checksum_algorithm_id" = "mn_checksum_algorithm"."id") ORDER BY "mn_object"."mtime" ASC LIMIT 1000 OFFSET 803000 ; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=592265.20..592267.70 rows=1000 width=308) (actual time=17220.744..17221.686 rows=1000 loops=1) -> Sort (cost=590257.70..592985.32 rows=1091048 width=308) (actual time=16551.271..17149.914 rows=804000 loops=1) Sort Key: mn_object.mtime Sort Method: external merge Disk: 288680kB -> Merge Join (cost=40.15..10954.91 rows=1091048 width=308) (actual time=0.074..9594.943 rows=1092711 loops=1) Merge Cond: (mn_object_format.id = mn_object.format_id) -> Index Scan using mn_object_format_pkey on mn_object_format (cost=0.00..37.45 rows=1080 width=45) (actual time=0.013..0.055 rows=30 loops=1) -> Materialize (cost=0.00..4663745.35 rows=1091048 width=263) (actual time=0.056..9098.164 rows=1092711 loops=1) -> Nested Loop (cost=0.00..4652834.87 rows=1091048 width=263) (actual time=0.052..7748.179 rows=1092711 loops=1) -> Index Scan using mn_object_format_id on mn_object (cost=0.00..2601558.15 rows=1091048 width=201) (actual time=0.041..5092.224 rows=1092711 loops=1) -> Index Scan using mn_checksum_algorithm_pkey on mn_checksum_algorithm (cost=0.00..1.87 rows=1 width=62) (actual time=0.001..0.002 rows=1 loops=1092711) Index Cond: (mn_checksum_algorithm.id = mn_object.checksum_algorithm_id) Total runtime: 17400.561 ms (13 rows) ~~~~~~~~~~~ API Method Metacat LDAP Mercury Node identifier name description baseUrl services (suggested changes here) name (API Name) version (release version, includes schema + interface) available (set by the CN / monitoring infrastructure) synchronization schedule lastHarvested lastCompleteHarvest health replicate synchronize type (MN | CN ) environment (test | staging | production )--remove [new 0.6.2] Metacat impl structure ------------------------ abstract class MNBase {} MNCoreImpl extends MNBase implements MNCore MNReadImpl extends MNBase implements MNRead MNAuthorizationImpl extends MNBase implements MNAuthorization MNStorageImpl extends MNBase implements MNStorage MNReplicationImpl extends MNBase implements MNReplication abstract class CNBase {} CNCoreImpl extends CNBase implements CNCore -partial Metacat implementation CNReadImpl extends CNBase implements CNRead -all Metacat? CNAuthorizationImpl extends CNBase implements CNAuthorization -all Metacat (sysMeta) **CNIdentityImpl extends CNBase implements CNIdentity -implemented outside of Metacat **CNRegisterImpl extends CNBase implements CNRegister -implemented outside Metacat CNReplicationImpl extends CNBase implements CNReplication -proxies to Metacat * many existing methods in CrudService refactored to MNBase * sysMeta-related methods in IdentityManager moved to SystemMetadataManager * similar CN and MN methods will reuse classes/methods for that shared function (e.g., SystemMetadataManager.getInstance().getSystemMetadata()) rather than sharing the root of a class hierarchy for shared methods. We want to keep the distinction between CN and MN stacks very clear at least in the class hierarchy. * refactor ResourceHandler to minimize duplicate code when handling the REST calls * Metacat won't implement CNIdentity, CNRegister TODO: Check on elementFormDefault="unqualified" attributeFormDefault="unqualified" To check -------- Identifier (simple type versus complex type) 0.6.1 To rename as new class, deprecate old ------------------------------------- ObjectFormat --> 0.6.2 (includes accompanying change in systemmetadata) Talk with Chris on Monday to make this change. AccessRule --> 0.6.1 (includes accompanying change in systemmetadata) Change today (Matt) SystemMetadata --> 0.6.3 (change in ordering of elements within sysmeta document) LogEntry --> 0.6.3 (entryId is of type Identifier, but there is an Identifier type already in LogEntry named, identifier. Change entryId type to NonEmptyString) To Remove (never used) ---------------------- EncryptedNonce --> 0.6.1 Challenge --> 0.6.1 To deprecate then remove ------------------------ AuthToken --> 0.6.4 Break apart the namespaces sequencing 0. switch Identifier back to the complex type that it was 1. finish 0.6.1 build 2. merge ObjectFormat changes - metacat merge, too. ?. update d1_common_ (currently 0.6.1), d1_libclient_ versions (currently 0.6.1) ?. update d1_integration version (currently 0.5.0) it's pom currently points to d1_libclient_java v 0.6.1 foresite@googlegroups.com Foresite Toolkit (Python) science_data_id A reference to a science data object using a DataONE identifier science_metadata_id Simple aggregation of science metadata and data A reference to a science metadata document using a DataONE identifier. Aggregation application/rdf+xml 2011-08-09T13:06:57Z 2011-08-09T13:06:57Z ResourceMap Replication Notes ----------------- Moved to http://epad.dataone.org/20110811-replication-notes Standup 2011-08-16 ------------------- CJones: working on design for CN replication manager; need to be aware of locking of system metadata wrt the synchronization and other CN services Rob: Working on d1_integration; issues with how tests copied into main; question as to whether to "turn the web build on its head"; can't package up tests and deploy them in the war for web exposure; - tests located in /sr/main don't run from maven command line. Sent email to Porter and Dave. after testing, looking at X509 self-signed certs; was able to easily create certs with Robert's code; needs to be expanded to make special purpose certs; -- re: authentication; need mechanism to update root public certs outside of shipping new libclient Rob (cont'd): need client code to use BigInteger instead of long refactored mn.describe() and types.v1.DescribeResponse to use BigInteger for content_length to be consistent with sysmeta.getSize() Robert: working on "the RMI problem"; the RMI seems to work; issue is how to create an HTTP servlet request that gets forwarded to Metacat; requestDispatcher forwards it, but the receiving context never receives the request; has added in everything he can see; hidden properties?; suggestion to code directly against metacat for these create/update/etc calls, sync service to call metacat directly; should be able to use an adapter class so that we can switch from HTTP to RMI later Roger: working against bug 1697; asymptotically approaching completion; working on issues on access control mixed in with performance; next on list is to go through all of the REST calls one more time to check for issues, for code refactoring and cleanup; need to get integration testing with java cleint against python server and vice versa Matt: Planning EIM workshop http://javaskeleton.blogspot.com/2010/07/avoiding-peer-not-authenticated-with.html Long-lived/no-password certificates: 1. cannot be CI-Logon (ttbook), because their long-lived certificates require passwords 2. proposed to be self-signed, or ones trusted by a dataONE CA. (the latter could revoke individual ones if necessary). in testing: * running integration tests that use multiple users. * running integration tests from Hudson * locate testing certificates under src/test/resources? * use X509CertificateManager in d1_certificate_manager to create them * from MNWebTester * need the server (provided in the form) to accept whatever is provided by the client * it is a requirement that the server accept our self-signed (dataONE trusted) certificates * because some of the tests will use these long-lived certificates * YET: the user running the tests (via browser) may be able to do some things, if he has a CI-Logon certificate installed. * Different test subjects, for example: * testUser1 * testUser2 * testUser3 * testGroup1 * testGroup2 * etc... * in production: * used for MN/CN service accounts * servers need to accept long-lived certificates from these 'service subjects' * CNs and MNs need to add the dataONE CA as a trusted root. Client-Side requirements: * ability to make calls to the server as different subjects (within the same process space) * the test-subject certificates (as PEM files) need to be available * what are the security concerns? (can they be included in the d1_libclient_java package?) * probably need to live outside that because of python libs Server-side requirements: * needs to extend trust the dataONE CA (that validates the test/service-subjects) * needs to accept certificate-less requests for certain method calls * General * create root certificate for dataONE cert * install PEM (public key) on test servers * store private key in secure location SSL connections and certificates needed (certificates used for SSL channel shown in parens) A. ITK1 -> TMN1 (u1@test.d1 -> h1@test.d1) u1@test.d1 is a DataONE signed certificate for a user h1@test.d1 is a DataONE signed certificare for a host B. TMN1 -> TCN1 (h1@test.d1 -> *.d1.org) C. TCN1 -> TCN2 (h2@test.d1 -> *.d1.org) D. TMN1 -> TMN2 (h1@test.d1 -> h2@test.d1) E1. TCN1 -> CILogon (*.d1.org -> ?) E2. CILogon -> TCN1 (? -> *.d1.org) Synopsis: 1. Test ITK users identify using D1 signed test certs 2. Test MNs identify using D1 signed test certs 3. Test CNs acting as servers identify using godaddy wildcard certs 4. Test CNs acting as clients identify using D1 signed test certs Identifying nodes ----------------- Node.Name: contains human readable name Node.ID: constant, opaque identifier for node Node.Subject: semi-constant Subject DN for node, may change when host services move to a new base URL (because of SSL conventions that CN has to correspond to service address) ------------------------------------------------- The following script is for testing only. I am monitoring port 80 and formatting results to be sent to my localhosts port 8125 In a separate shell i run : nc -ul localhost 8125 then I run a bash script entitled tcpStatToStatd.sh -------------------------------------------------- #!/bin/bash if [ ! -d /home/rwaltz/bin/TcpToStatD ]; then mkdir /home/rwaltz/bin/TcpToStatD; fi PREFIX=`/bin/hostname -s` PREFIX="tcpstat.$PREFIX.5701" nohup tcpstat -i eth0 -f "tcp port 80" -F -o "$PREFIX.Bytes:%N|g\n$PREFIX.TcpPackets:%T|g\n$PREFIX.bps:%b|g\n" 10 > /home/rwaltz/bin/TcpToStatD/tcpstat.out 2> /home/rwaltz/bin/TcpToStatD/tcpstat.err < /dev/null & # need to sleep or tail might fail with file not found sleep 5 # place a subshell in the background with ( ) # 'sed' is used to remove precision from bps stat # 'trap' prevents child processes dying with SIGHUP when shell is exited # 'disown' the script so it will not wait for the background job to finish before it exits ( exec 0/home/rwaltz/bin/TcpToStatD/nctail.out # stdout exec 2>/home/rwaltz/bin/TcpToStatD/nctail.err # stderr trap "" HUP tail -f /home/rwaltz/bin/TcpToStatD/tcpstat.out | sed --unbuffered 's/$[0-9]\+$\.[0-9]\+/\1/' | nc -u localhost 8125 ) & disown #!/bin/bash DEST="/tmp/TcpToStatD" FDEST="${DEST}/tcpstat.out" EDEST="${DEST}/tcpstat.err" if [ ! -d ${DEST} ]; then mkdir ${DEST}; fi PREFIX=`/bin/hostname -s` PREFIX="tcpstat.$PREFIX.5701" touch ${FDEST} nohup tcpstat -i eth0 -f "tcp port 80" -F -o "$PREFIX.Bytes:%N|g\n$PREFIX.TcpPackets:%T|g\n$PREFIX.bps:%b|g\n" 10 > ${FDEST} 2> ${EDEST} < /dev/null & # need to sleep or tail might fail with file not found # touch fixes this #sleep 5 # place a subshell in the background with ( ) # 'sed' is used to remove precision from bps stat # 'trap' prevents child processes dying with SIGHUP when shell is exited # 'disown' the script so it will not wait for the background job to finish before it exits ( exec 0/home/rwaltz/bin/TcpToStatD/nctail.out # stdout exec 2>/home/rwaltz/bin/TcpToStatD/nctail.err # stderr trap "" HUP tail -f ${FDEST} | sed --unbuffered 's/$[0-9]\+$\.[0-9]\+/\1/' | nc -u localhost 8125 ) & disown ---------------------------------------------------------------------------------------------------- Last problem to note is that when the listener dies, we have 4 processes to clean up before restarting ---------------------------------------------------------------------------------------------------- #!/bin/bash DEST="/home/rwaltz/bin/TcpToStatD" FDEST="${DEST}/tcpstat.out" EDEST="${DEST}/tcpstat.err" if [ ! -d ${DEST} ]; then mkdir ${DEST}; fi PREFIX=`/bin/hostname -s` PREFIX="tcpstat.$PREFIX.5701" touch ${FDEST} nohup tcpstat -i eth0 -f "tcp port 80" -F -o "$PREFIX.Bytes:%N|g\n$PREFIX.TcpPackets:%T|g\n$PREFIX.bps:%b|g\n" 10 > ${FDEST} 2> ${EDEST} < /dev/null & # place a subshell in the background with ( ) # 'sed' is used to remove precision from bps stat # 'trap' prevents child processes dying with SIGHUP when shell is exited # 'disown' the script so it will not wait for the background job to finish before it exits FNCDEST="${DEST}/nctail.out" ENCDEST="${DEST}/nctail.err" ( exec 0${FNCDEST} # stdout exec 2>${ENCDEST} # stderr trap "" HUP tail -f ${FDEST} | sed --unbuffered 's/$[0-9]\+$\.[0-9]\+/\1/' | nc -u localhost 8125 ) & disown Alright, looks like the NC option is too flakey. This does the job in python: ==== sendstats.py ==== '''Given statsd formatted input on stdin, break on \n and send to statsd server ''' import sys import socket import logging class Client(object): def __init__(self, host="statsd.dataone.org", port=8125): self.addr = (host, port) self._udp = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) def send(self, message): try: self._udp.sendto(message, self.addr) except (Exception, e): logging.error("Bummer: %s" % str(e)) if __name__ == "__main__": logging.basicConfig(level=logging.INFO) client = Client() while 1: line = sys.stdin.readline().strip() if not line: break if line.startswith("#"): break client.send(line) logging.info(line) ==== It reads from stdin, line by line. Run it like tail -f /some.file | python sendstats.py ==== =================================================================================================== failed: [160.36.13.145] => {"failed": true, "item": ""} msg: 'apt-get install 'dataone-cn-os-core' ' failed: dataone-cn-os-core failed to preconfigure, with exit status 30 E: Sub-process /usr/bin/dpkg returned an error code (1) will affect apache/ldap and postgres setup cp /etc/ssl/private/ssl-cert-snakeoil.key /etc/dataone/client/private/puppet-dev.utk.edu.key cp /etc/ssl/certs/ssl-cert-snakeoil.pem /etc/dataone/client/certs/puppet-dev.utk.edu.pem cp /etc/ssl/certs/ssl-cert-snakeoil.pem /etc/ssl/certs/mockDataoneCA.crt cat /etc/ssl/certs/ssl-cert-snakeoil.pem > /etc/dataone/client/private/urn:node:CNPUPPETDEV.pem cat /etc/ssl/private/ssl-cert-snakeoil.key >> /etc/dataone/client/private/urn:node:CNPUPPETDEV.pem