Test: urn:node:mnTestSEAD
Prod: urn:node:SEAD
http://mule1.dataone.org/ArchitectureDocs-current/apis/Types.html#Types.Node
openssl x509 -text -in /path/to/your/urn:node:mnTestSEAD
https://cn-stage.test.dataone.org/cn/v1/accounts
https://cn-stage.test.dataone.org/portal
========================================================================
IP Based access restriction for production CN access to getLogRecords:
# Path needs to be adjusted according to baseURL path of MN
# Example show for a baseURL path of "/mn"
Order Deny,Allow
Deny from all
# IP addresses for CNs required since there is no reverse DNS
Allow from 128.111.36.80 64.106.40.6 160.36.13.150
# IF https is available:
SSLRequireSSL
SSLRequire %{SSL_CLIENT_S_DN_CN} eq "urn:node:CN" or
%{SSL_CLIENT_S_DN_CN} eq "urn:node:CNUCSB1"
...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SUBJECT_VERIFIED = 'verifiedUser'
SUBJECT_AUTHENTICATED = 'authenticatedUser'
SUBJECT_PUBLIC = 'public'
1. Start with empty list of subjects
2. Add the symbolic subject, "public"
3. Get the DN from the Subject and serialize it to a standardized string. This string is called Subject below.
4. Add Subject
5. If the connection was not made with a certificate, stop here.
6. Add the symbolic subject, "authenticatedUser"
7. If the certificate does not have a SubjectInfo extension, stop here.
8. Find the Person that has a subject that matches the Subject.
9. If the Person has the verified flag set, add the symbolic subject, "verifiedUser"
10. Iterate over the Person's isMemberOf and add those subjects
11. Iterate over the Person's equivalentIdentity and for each of them:
- Add its subject
- Find the corresponding Person
- Iterate over that Person's isMemberOf and add those subjects
~~~~~~~~~~~~~~
Index task processing:
1. the system metadata fields are loaded
2. sub-processors for the ID are run (loading science metadata for example)
3. then things are merged with the values from SOLR
so the final merge part - the only values that should override the values extracted in the task should be those related to the resource map entries.
Im still not sure what that merge logic is for. Ive backed out my changes. The problem with the multi-values in single value field does not seem related to the merge specifically.
Skye - want to hop onto Skype to talk this through?
sure! starting skype - Im on skype now
public SolrDoc processID(String id, String sysMetaPath, String objectPath) throws IOException,
SAXException, ParserConfigurationException, XPathExpressionException, EncoderException {
// Load the System Metadata document
Document sysMetaDoc = loadDocument(sysMetaPath, input_encoding);
if (sysMetaDoc == null) {
log.error("Could not load System metadate for ID: " + id);
return null;
}
// Extract the field values from the System Metadata
List sysSolrFields = processFields(sysMetaDoc);
SolrDoc indexDocument = new SolrDoc(sysSolrFields);
Map docs = new HashMap();
docs.put(id, indexDocument);
// Determine if subprocessors are available for this ID
if (subprocessors != null) {
// for each subprocessor loaded from the spring config
for (IDocumentSubprocessor subprocessor : subprocessors) {
// Does this subprocessor apply?
if (subprocessor.canProcess(sysMetaDoc)) {
// if so, then extract the additional information from the
// document.
try {
// docObject = the resource map document or science
// metadata document.
// note that resource map processing touches all objects
// referenced by the resource map.
Document docObject = loadDocument(objectPath, input_encoding);
if (docObject == null) {
log.error("Could not load OBJECT file for ID,Path=" + id + ", "
+ objectPath);
} else {
docs = subprocessor.processDocument(id, docs, docObject);
}
} catch (Exception e) {
log.error(e.getStackTrace().toString());
}
}
}
}
// ? why is this here
// docs.put(id, indexDocument);
// TODO: get list of unmerged documents and do single http request for
// all
// unmerged documents
for (SolrDoc mergeDoc : docs.values()) {
if (!mergeDoc.isMerged()) {
mergeWithIndexedDocument(mergeDoc);
}
}
SolrElementAdd addCommand = getAddCommand(new ArrayList(docs.values()));
if (log.isTraceEnabled()) {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
addCommand.serialize(baos, "UTF-8");
log.trace(baos.toString());
// System.out.println(baos.toString());
}
sendCommand(addCommand);
if (docs.size() > 0)
docs.clear();
return indexDocument;
}
======================================================================
notes from 2011-10-11
???
ornldaac_mn
ORNL DAAC Mercury Member Node
http://mercury.ornl.gov/ornldaac/mn/
1900-01-01T00:00:00.000+00:00
1900-01-01T00:00:00.000+00:00
Questions - is HTTPS required? If authorized access is ever required then HTTPS will be necessary.
=====
Content in this pad is temporary only - Use it like a scratch pad.
gmn=> explain analyze SELECT "mn_object"."id", "mn_object"."pid", "mn_object"."url", "mn_object"."format_id", "mn_object"."checksum", "mn_object"."checksum_algorithm_id", "mn_object"."mtime", "mn_object"."db_mtime", "mn_object"."size", "mn_object_format"."id", "mn_object_format"."format_id", "mn_object_format"."format_name", "mn_object_format"."sci_meta", "mn_checksum_algorithm"."id", "mn_checksum_algorithm"."checksum_algorithm"
FROM "mn_object"
INNER JOIN "mn_object_format" ON ("mn_object"."format_id" = "mn_object_format"."id")
INNER JOIN "mn_checksum_algorithm" ON ("mn_object"."checksum_algorithm_id" = "mn_checksum_algorithm"."id")
ORDER BY "mn_object"."mtime" ASC
LIMIT 1000
OFFSET 803000
;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=592265.20..592267.70 rows=1000 width=308) (actual time=17220.744..17221.686 rows=1000 loops=1)
-> Sort (cost=590257.70..592985.32 rows=1091048 width=308) (actual time=16551.271..17149.914 rows=804000 loops=1)
Sort Key: mn_object.mtime
Sort Method: external merge Disk: 288680kB
-> Merge Join (cost=40.15..10954.91 rows=1091048 width=308) (actual time=0.074..9594.943 rows=1092711 loops=1)
Merge Cond: (mn_object_format.id = mn_object.format_id)
-> Index Scan using mn_object_format_pkey on mn_object_format (cost=0.00..37.45 rows=1080 width=45) (actual time=0.013..0.055 rows=30 loops=1)
-> Materialize (cost=0.00..4663745.35 rows=1091048 width=263) (actual time=0.056..9098.164 rows=1092711 loops=1)
-> Nested Loop (cost=0.00..4652834.87 rows=1091048 width=263) (actual time=0.052..7748.179 rows=1092711 loops=1)
-> Index Scan using mn_object_format_id on mn_object (cost=0.00..2601558.15 rows=1091048 width=201) (actual time=0.041..5092.224 rows=1092711 loops=1)
-> Index Scan using mn_checksum_algorithm_pkey on mn_checksum_algorithm (cost=0.00..1.87 rows=1 width=62) (actual time=0.001..0.002 rows=1 loops=1092711)
Index Cond: (mn_checksum_algorithm.id = mn_object.checksum_algorithm_id)
Total runtime: 17400.561 ms
(13 rows)
~~~~~~~~~~~
API Method Metacat LDAP Mercury
Node
identifier
name
description
baseUrl
services (suggested changes here)
name (API Name)
version (release version, includes schema + interface)
available (set by the CN / monitoring infrastructure)
synchronization
schedule
lastHarvested
lastCompleteHarvest
health
replicate
synchronize
type (MN | CN )
environment (test | staging | production )--remove
[new 0.6.2] Metacat impl structure
------------------------
abstract class MNBase {}
MNCoreImpl extends MNBase implements MNCore
MNReadImpl extends MNBase implements MNRead
MNAuthorizationImpl extends MNBase implements MNAuthorization
MNStorageImpl extends MNBase implements MNStorage
MNReplicationImpl extends MNBase implements MNReplication
abstract class CNBase {}
CNCoreImpl extends CNBase implements CNCore
-partial Metacat implementation
CNReadImpl extends CNBase implements CNRead
-all Metacat?
CNAuthorizationImpl extends CNBase implements CNAuthorization
-all Metacat (sysMeta)
**CNIdentityImpl extends CNBase implements CNIdentity
-implemented outside of Metacat
**CNRegisterImpl extends CNBase implements CNRegister
-implemented outside Metacat
CNReplicationImpl extends CNBase implements CNReplication
-proxies to Metacat
* many existing methods in CrudService refactored to MNBase
* sysMeta-related methods in IdentityManager moved to SystemMetadataManager
* similar CN and MN methods will reuse classes/methods for that shared function (e.g., SystemMetadataManager.getInstance().getSystemMetadata()) rather than sharing the root of a class hierarchy for shared methods. We want to keep the distinction between CN and MN stacks very clear at least in the class hierarchy.
* refactor ResourceHandler to minimize duplicate code when handling the REST calls
* Metacat won't implement CNIdentity, CNRegister
TODO: Check on elementFormDefault="unqualified" attributeFormDefault="unqualified"
To check
--------
Identifier (simple type versus complex type) 0.6.1
To rename as new class, deprecate old
-------------------------------------
ObjectFormat --> 0.6.2 (includes accompanying change in systemmetadata) Talk with Chris on Monday to make this change.
AccessRule --> 0.6.1 (includes accompanying change in systemmetadata) Change today (Matt)
SystemMetadata --> 0.6.3 (change in ordering of elements within sysmeta document)
LogEntry --> 0.6.3 (entryId is of type Identifier, but there is an Identifier type already in LogEntry named, identifier. Change entryId type to NonEmptyString)
To Remove (never used)
----------------------
EncryptedNonce --> 0.6.1
Challenge --> 0.6.1
To deprecate then remove
------------------------
AuthToken --> 0.6.4
Break apart the namespaces
sequencing
0. switch Identifier back to the complex type that it was
1. finish 0.6.1 build
2. merge ObjectFormat changes
- metacat merge, too.
?. update d1_common_ (currently 0.6.1), d1_libclient_ versions (currently 0.6.1)
?. update d1_integration version (currently 0.5.0)
it's pom currently points to d1_libclient_java v 0.6.1
foresite@googlegroups.com
Foresite Toolkit (Python)
science_data_id
A reference to a science data object using a DataONE identifier
science_metadata_id
Simple aggregation of science metadata and data
A reference to a science metadata document using a DataONE identifier.
Aggregation
application/rdf+xml
2011-08-09T13:06:57Z
2011-08-09T13:06:57Z
ResourceMap
Replication Notes
-----------------
Moved to http://epad.dataone.org/20110811-replication-notes
Standup 2011-08-16
-------------------
CJones: working on design for CN replication manager; need to be aware of locking of system metadata wrt the synchronization and other CN services
Rob: Working on d1_integration; issues with how tests copied into main; question as to whether to "turn the web build on its head"; can't package up tests and deploy them in the war for web exposure;
- tests located in /sr/main don't run from maven command line. Sent email to Porter and Dave.
after testing, looking at X509 self-signed certs; was able to easily create certs with Robert's code; needs to be expanded to make special purpose certs;
-- re: authentication; need mechanism to update root public certs outside of shipping new libclient
Rob (cont'd): need client code to use BigInteger instead of long
refactored mn.describe() and types.v1.DescribeResponse to use BigInteger for content_length to be consistent with sysmeta.getSize()
Robert: working on "the RMI problem"; the RMI seems to work; issue is how to create an HTTP servlet request that gets forwarded to Metacat; requestDispatcher forwards it, but the receiving context never receives the request; has added in everything he can see; hidden properties?; suggestion to code directly against metacat for these create/update/etc calls, sync service to call metacat directly; should be able to use an adapter class so that we can switch from HTTP to RMI later
Roger: working against bug 1697; asymptotically approaching completion; working on issues on access control mixed in with performance; next on list is to go through all of the REST calls one more time to check for issues, for code refactoring and cleanup; need to get integration testing with java cleint against python server and vice versa
Matt: Planning EIM workshop
http://javaskeleton.blogspot.com/2010/07/avoiding-peer-not-authenticated-with.html
Long-lived/no-password certificates:
1. cannot be CI-Logon (ttbook), because their long-lived certificates require passwords
2. proposed to be self-signed, or ones trusted by a dataONE CA. (the latter could revoke individual ones if necessary).
in testing:
* running integration tests that use multiple users.
* running integration tests from Hudson
* locate testing certificates under src/test/resources?
* use X509CertificateManager in d1_certificate_manager to create them
* from MNWebTester
* need the server (provided in the form) to accept whatever is provided by the client
* it is a requirement that the server accept our self-signed (dataONE trusted) certificates
* because some of the tests will use these long-lived certificates
* YET: the user running the tests (via browser) may be able to do some things, if he has a CI-Logon certificate installed.
* Different test subjects, for example:
* testUser1
* testUser2
* testUser3
* testGroup1
* testGroup2
* etc...
*
in production:
* used for MN/CN service accounts
* servers need to accept long-lived certificates from these 'service subjects'
* CNs and MNs need to add the dataONE CA as a trusted root.
Client-Side requirements:
* ability to make calls to the server as different subjects (within the same process space)
* the test-subject certificates (as PEM files) need to be available
* what are the security concerns? (can they be included in the d1_libclient_java package?)
* probably need to live outside that because of python libs
Server-side requirements:
* needs to extend trust the dataONE CA (that validates the test/service-subjects)
* needs to accept certificate-less requests for certain method calls
*
General
* create root certificate for dataONE cert
* install PEM (public key) on test servers
* store private key in secure location
SSL connections and certificates needed
(certificates used for SSL channel shown in parens)
A. ITK1 -> TMN1
(u1@test.d1 -> h1@test.d1)
u1@test.d1 is a DataONE signed certificate for a user
h1@test.d1 is a DataONE signed certificare for a host
B. TMN1 -> TCN1
(h1@test.d1 -> *.d1.org)
C. TCN1 -> TCN2
(h2@test.d1 -> *.d1.org)
D. TMN1 -> TMN2
(h1@test.d1 -> h2@test.d1)
E1. TCN1 -> CILogon
(*.d1.org -> ?)
E2. CILogon -> TCN1
(? -> *.d1.org)
Synopsis:
1. Test ITK users identify using D1 signed test certs
2. Test MNs identify using D1 signed test certs
3. Test CNs acting as servers identify using godaddy wildcard certs
4. Test CNs acting as clients identify using D1 signed test certs
Identifying nodes
-----------------
Node.Name: contains human readable name
Node.ID: constant, opaque identifier for node
Node.Subject: semi-constant Subject DN for node, may change when host services move to a new base URL (because of SSL conventions that CN has to correspond to service address)
-------------------------------------------------
The following script is for testing only. I am monitoring port 80 and formatting results to be sent to my localhosts port 8125
In a separate shell i run : nc -ul localhost 8125
then I run a bash script entitled tcpStatToStatd.sh
--------------------------------------------------
#!/bin/bash
if [ ! -d /home/rwaltz/bin/TcpToStatD ]; then
mkdir /home/rwaltz/bin/TcpToStatD;
fi
PREFIX=`/bin/hostname -s`
PREFIX="tcpstat.$PREFIX.5701"
nohup tcpstat -i eth0 -f "tcp port 80" -F -o "$PREFIX.Bytes:%N|g\n$PREFIX.TcpPackets:%T|g\n$PREFIX.bps:%b|g\n" 10 > /home/rwaltz/bin/TcpToStatD/tcpstat.out 2> /home/rwaltz/bin/TcpToStatD/tcpstat.err < /dev/null &
# need to sleep or tail might fail with file not found
sleep 5
# place a subshell in the background with ( )
# 'sed' is used to remove precision from bps stat
# 'trap' prevents child processes dying with SIGHUP when shell is exited
# 'disown' the script so it will not wait for the background job to finish before it exits
( exec 0/home/rwaltz/bin/TcpToStatD/nctail.out # stdout
exec 2>/home/rwaltz/bin/TcpToStatD/nctail.err # stderr
trap "" HUP
tail -f /home/rwaltz/bin/TcpToStatD/tcpstat.out | sed --unbuffered 's/\([0-9]\+\)\.[0-9]\+/\1/' | nc -u localhost 8125 ) &
disown
#!/bin/bash
DEST="/tmp/TcpToStatD"
FDEST="${DEST}/tcpstat.out"
EDEST="${DEST}/tcpstat.err"
if [ ! -d ${DEST} ]; then
mkdir ${DEST};
fi
PREFIX=`/bin/hostname -s`
PREFIX="tcpstat.$PREFIX.5701"
touch ${FDEST}
nohup tcpstat -i eth0 -f "tcp port 80" -F -o "$PREFIX.Bytes:%N|g\n$PREFIX.TcpPackets:%T|g\n$PREFIX.bps:%b|g\n" 10 > ${FDEST} 2> ${EDEST} < /dev/null &
# need to sleep or tail might fail with file not found
# touch fixes this
#sleep 5
# place a subshell in the background with ( )
# 'sed' is used to remove precision from bps stat
# 'trap' prevents child processes dying with SIGHUP when shell is exited
# 'disown' the script so it will not wait for the background job to finish before it exits
( exec 0/home/rwaltz/bin/TcpToStatD/nctail.out # stdout
exec 2>/home/rwaltz/bin/TcpToStatD/nctail.err # stderr
trap "" HUP
tail -f ${FDEST} | sed --unbuffered 's/\([0-9]\+\)\.[0-9]\+/\1/' | nc -u localhost 8125 ) &
disown
----------------------------------------------------------------------------------------------------
Last problem to note is that when the listener dies, we have 4 processes to clean up before restarting
----------------------------------------------------------------------------------------------------
#!/bin/bash
DEST="/home/rwaltz/bin/TcpToStatD"
FDEST="${DEST}/tcpstat.out"
EDEST="${DEST}/tcpstat.err"
if [ ! -d ${DEST} ]; then
mkdir ${DEST};
fi
PREFIX=`/bin/hostname -s`
PREFIX="tcpstat.$PREFIX.5701"
touch ${FDEST}
nohup tcpstat -i eth0 -f "tcp port 80" -F -o "$PREFIX.Bytes:%N|g\n$PREFIX.TcpPackets:%T|g\n$PREFIX.bps:%b|g\n" 10 > ${FDEST} 2> ${EDEST} < /dev/null &
# place a subshell in the background with ( )
# 'sed' is used to remove precision from bps stat
# 'trap' prevents child processes dying with SIGHUP when shell is exited
# 'disown' the script so it will not wait for the background job to finish before it exits
FNCDEST="${DEST}/nctail.out"
ENCDEST="${DEST}/nctail.err"
( exec 0${FNCDEST} # stdout
exec 2>${ENCDEST} # stderr
trap "" HUP
tail -f ${FDEST} | sed --unbuffered 's/\([0-9]\+\)\.[0-9]\+/\1/' | nc -u localhost 8125 ) &
disown
Alright, looks like the NC option is too flakey. This does the job in python:
==== sendstats.py ====
'''Given statsd formatted input on stdin, break on \n and send to statsd server
'''
import sys
import socket
import logging
class Client(object):
def __init__(self, host="statsd.dataone.org", port=8125):
self.addr = (host, port)
self._udp = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
def send(self, message):
try:
self._udp.sendto(message, self.addr)
except (Exception, e):
logging.error("Bummer: %s" % str(e))
if __name__ == "__main__":
logging.basicConfig(level=logging.INFO)
client = Client()
while 1:
line = sys.stdin.readline().strip()
if not line:
break
if line.startswith("#"):
break
client.send(line)
logging.info(line)
====
It reads from stdin, line by line. Run it like tail -f /some.file | python sendstats.py
====
===================================================================================================
failed: [160.36.13.145] => {"failed": true, "item": ""}
msg: 'apt-get install 'dataone-cn-os-core' ' failed: dataone-cn-os-core failed to preconfigure, with exit status 30
E: Sub-process /usr/bin/dpkg returned an error code (1)
will affect apache/ldap and postgres setup
cp /etc/ssl/private/ssl-cert-snakeoil.key /etc/dataone/client/private/puppet-dev.utk.edu.key
cp /etc/ssl/certs/ssl-cert-snakeoil.pem /etc/dataone/client/certs/puppet-dev.utk.edu.pem
cp /etc/ssl/certs/ssl-cert-snakeoil.pem /etc/ssl/certs/mockDataoneCA.crt
cat /etc/ssl/certs/ssl-cert-snakeoil.pem > /etc/dataone/client/private/urn:node:CNPUPPETDEV.pem
cat /etc/ssl/private/ssl-cert-snakeoil.key >> /etc/dataone/client/private/urn:node:CNPUPPETDEV.pem