Notes for 2013.18-Block.3.1
===========================
Skye
----
- 20130510
- Releasing ONEMercury
- Will delete 1.1.2_RC2 tag with CSS change
- Had to deal with a Solr bug - that's fixed now
- ESRI support is in now
- Mercury supports UTF-8 now
- Auditing reporting meeting
- Would like to expand Event enumeration, typing as string would break backwards compatibility
- May use a separate index if the change breaks backwards compatilbility
- Model it with the agg log schema
- Production rollout
- 20130508
- Working on auditing
- Working with Solr/Jetty for upgrading to Solr 4.x - possibly use Jetty for indexing?
- 20130506
- Lucene conference was good
- learned a good deal about Solr, etc.
- Solr 4.x has memory improvements, processing improvements: would like to upgrade
- Back to auditing
Roger
-----
- 20130510
- Working on ONEDrive to describe hierarchical files
- strarted d1_workspace_client, may become part of libclient
- simple, recursive structure (models folders)
- 20130508
- design specs for ONEDrive and ONEMercury/other web UIs
- security considerations are interesting
- looking at OAuth
- 20130506
- U/A meeting: came up with ONEDrive solutions
- ONEDrive will work in coordination with a web search UI like ONEMercury
- storage of queries or objects in ONEMercury, assigned to a folder
- design work needs to be done on designing a REST API for storing collections (workspace concept)
- Will add to architecture documentation
David Doyle
-----------
- Reading, getting a handle on tools and software
- Finals for the semester
- Will probably work on documentation
Chris B.
--------
- 20130510
- David Doyle is going to be replacing Chris
- Will work on debconf stuff over the weekend
- Robert: are deb packages ready? Some errors in the buildout
- 20130506
- Becoming familiar with foresite
- Working on debconf flags
- 20130503
- Familiarizing himself with ORE for indexing work
- 20130501
- Ansible templating issue fixed - using newest 1.2 version now
- site fact module can be used now
- waiting for packages from Robert
- Robert wants flags set (not just in debconf), so those will need to be integrated into the Ansible module
- bioportal VM has been compromised
- rooted due to unchanged base root password (brute force ssh attack)
- installed bind, httpd
- isolated issue, no ldap. no d1 passwords.
- VM was basically unmaintained, which caused the issue really
- Will save VM disk
- Will work on debconf modules for flags above now
- Will work on Ansible documentation, system architecture
- New student coming on in June, will train this person
- 20130429
- Ansible work - found a bug that is a blocker
- setting a map of maps for debconf entries
- templating system isn't expanding it
- Trying Ansible 1.2 - should be fixed
- Not working with DataONE buildout packages yet
Rob
---
- 20130510
- Working on very large resource maps, will write up a summary in the architecture docs
- Will be recommending a nesting solution first
- shouldn't affect indexing
- When serializing resource maps, Jibx does a lot of validations per iteration, which is a performance hit (6 minutes)
- Deserializing is quick (6 seconds)
- 20130508
- Working on aggregations and data packaging requirements
- Updated packaging documentation, Dave will review
- changed MUST to SHOULD wrt aggregation URIs following hash URI form, due to existing resource maps that do not follow hash URI form
- added requirement that references (URIs) to other data packages must resolve to a resourse map (since we can't guarantee that agg. URIs resolve.
- added section on resource map validation
- added section on how to reference other data packages
- added section on very large packages with performance statistics
- Continued work on ORE parsing
- large maps ( > 10K triples) take 5 minutes to serialize, 5 sec to deserialize
- dealing with NPEs during deserialization (triple with null subject?)
- Nesting relationships: can possibly only use unidirectional predicates for 'aggregates'. Can infer the inverse.
- Question: do we need the identifier statement?
- the impact: if we don't use it and also remove requirement for isDocumentedBy and isAggregatedBy, can make the resource maps 1/3 the size, and take 50% of the time to serialize.
- 20130506
- 20130503
- Working on nesting resource maps
- ORE spec: can aggregate aggregations, but the resource maps are distinct
- {resourceMapURI}#aggregation - having links to the aggregation needs to be consistent
- also, we may be able to infer transitivity across aggregations
- If aggregation ids need to be unique, we'd need to check this in DataONE (d1_sync?)
- TODO: decode if we'll aggregate resource maps or aggregate aggregations
- worked on various sized resource maps
- up to 10K triples, 30K gave OOM exceptions using the foresite library in libclient_java
- 20130501
- figured out the reasoning setup - needed to feed in the ORE schema.
- building it into ResourceMapFactory in libclient_java
- will cache the ORE model in libclient_java
- need to set up tests that make sure that reusing the cached model doesn't cause problems.
- TODO: will need to support nested packaging in libclients' DataPackage classes
- 20130429
- Working with the OWL Reasoner in foresite
- Understanding the different rasoners for RDFS, OWL
- Question: mutability and namespaced identifiers
- What are the requirements for citation wrt DataONE?
Robert
------
- 20130510
- Was able to get a buildout fully run
- Worked with Skye on log event management
- If Event types become strings, MN event types
- Changing the Event enumm to string would break libclient v1 compatibility
- Talked about an index for exception reporting
- 20130508
- Debugging the buildout packages - making sure it builds and installs correctly
- Discussion with Skye re: log aggregation Solr index to store auditing events
- 20130506
- Working on debconf packaging
- Finished config script work on Friday
- Now finishing up postinst (template names have changed, merging differences and removing redundancies/obsolete code)
- Spent last week in the U/A meeting
- 20130503
- Survived the week
- Good meeting with the U/A group
- MN issues - good feedback
- Will continue on packaging code - should be able to finish up today
- 20130501
- U/A meeting
- Need a few more hours on packaging work
- doing work in github until it can build properly
- 20130429
- CN Packaging
- debconf reading, debian documentation is conflicted with the use of config
- Will reduce the amount of prompting
- Prepping for U/A meeting
Chris J.
--------
- 20130510
- Discussion with Rob on ORE maps
- Metacat work and meeting with Ben, Matt, Jing
- Review of ONEDrive summary that Roger is working on
- We need to discuss the APIs needed for UIs, along with the Log Stats API
- Review of buildout and release with Skye for ONEMercury changes
- Tagging: immutable tags can be 'difficult' when a minor change is discovered
- But, tags should be immutable, so:
Proposal: Let's tag as: COMPONENT_NAME_vX.X.X_RCX, and once we are released, tag as COMPONENT_NAME_vX.X.X
- Further editing work on the sensor best practices document
- Will be out Tue, Wed, Thu of next week
- 20130508
- Working on System Metadata Management architecture docs
- Worked with Skye on auditing
- Upgraded CNs with Ben
- Priorities organization with Dave
- 20130501
- Metacat-specific work
- troubleshooting the CN-CN replication issues #3740
- working on system metadata managment propposal
- Will work with Ben on CN upgrade of Metacat installations
- Replica target nodes: need to deprecate nodes in the same
- 20130429
Matt
----
- 20130503
- security team CC
- Will be at the UNM DataONE planning meeting for the next round
- what do we need to do in the next 5 years from a technological perspective
- send comments to Dave and Matt
- May end up writing leveraged grants to continue the D1 CI
- 20130501
- Spent some time on the CN-CN replication issues
- Patched Metacat with LEFT JOIN SQL query
- will go into 2.0.7 on the CNs
Ben
---
- 20130508
- Potential PPBIO Member Node. Interest from "dvd" on IRC yesterday (new info manager there in Brazil). https://redmine.dataone.org/issues/3748
- Where do we send people to contact D1? John Cobb
- Involved in the security audit planning in June
- Working with CILogon and portal code; delegation of certificates has changed
- Need to target this for a CN release
- 20130506
- Upgraded stage CNs to Metacat 2.0.7
- We had a few objects that didn't Metacat-replicate correctly, due to conflicting autogen ids
- Ben had written a script to clean up id mappings on each CN, which we'll use
- Will be upgrading the production CNs today
- Will be working on CILogon work this week
- 20130503
- Will be upgrading the CNs in stage and production
- We will only be turning off Metacat-tied services
- Will go through the round robin procedures
- Database dumps are the big time sync - a few hours of upgrade work
- 20130501
- Working on the Metacat 2.0.7 release branch, testing now (building via Hudson)
- Unstable and stable channels are using 2.0.7 now
- Need to update Authentication documentation
Dave V
------
- 20130510
- Working on tagging architecture docs
- Developer resources doc
- Product/Component dependencies
- Working through scheduling for products, especially CN stack version plans
- Pondering high level design for ONEDrive and other ITK product interactions with portal
- 20130508
- At PPSR meeting (lots of interesting high volume citizen observation projects)
- Working through dependencies, updating component docs
- Goal is to have:
- list of all products
- dependency between products
- activities for each product
- scheduling / ordering of activties for each product
- overall scheduling
- 20130506
- Reviewed resource map docs
- usefulness of aggregations
- points to a concept of a collect
- resource map is the instantiation of the concept
- content negotiation determines resource map serialization
- D1 is most interested in the conceptual collection - serialization aside
- Data collections should really be referenced by the aggregation id
- For now: reference the resource map, but recommend future use of referencing the aggregation
- Behaviour will need to be backwards compatible so that existing RMs that do not follow the recommendation are still properly interpreted by DataONE service.
- 20130503
- Good workplan for ONEDrive from the U/A meeting
- Will be cleaning up ask.dataone.org
- Initial call with Jim Basney, CILogon security audit
- Ben will be POC for technical issues
- Formalizing ONEDrive workplan
- Tweaking ask.dataone.org a bit
- Initial call with CTSC for security audit
- Focus on high level architecture and implementation
- Unlikely to reach code review level of detail
- 20130501
- At Usability group meeting
- Evaluating ONEDrive UI design and next steps
- may present ONEDrive at DUG meeting in July?
- Bob Sandusky will be at a DSpace meeting next week
Discussion on ORE parsing in libclient
--------------------------------------
- ResourceMapFactory now exposes new methods:
- deserializeResourceMap(InputStream is, boolean useReasoners)
- returns an inferred (or not) resource map
Nesting DataPackages to keep 'em short.
see http://www.openarchives.org/ore/1.0/datamodel#ore:isDescribedBy
RemP describes AggregationP
AggregationP aggregates MD-All
Aggregation aggregates ChP1, ChP2, ChP3, ...
RemChP describes AggregationChP1
D1 isDocumentedBy MD-All
D2 isBlahBlahBlah
<rdf:Property rdf:about="http://www.openarchives.org/ore/terms/aggregates">
<rdfs:label>Aggregates</rdfs:label>
<rdfs:comment>Aggregations, by definition, aggregate resources. The ore:aggregates relationship expresses that the object resource is a member of the set of Aggregated Resources of the subject (the Aggregation). This relationship between the Aggregation and its Aggregated Resources is thus more specific than a simple part/whole relationship, as expressed by dcterms:hasPart for example.</rdfs:comment>
<rdfs:subPropertyOf rdf:resource="http://purl.org/dc/terms/hasPart" />
<rdfs:domain rdf:resource="http://www.openarchives.org/ore/terms/Aggregation" />
<rdfs:range rdf:resource="http://www.openarchives.org/ore/terms/AggregatedResource" />
<owl:inverseOf rdf:resource="http://www.openarchives.org/ore/terms/isAggregatedBy" />
<rdfs:isDefinedBy rdf:resource="http://www.openarchives.org/ore/terms/"/>
</rdf:Property>
D1 Indexing:
https://repository.dataone.org/software/cicore/trunk/cn/d1_cn_common/
https://repository.dataone.org/software/cicore/trunk/cn/d1_cn_index_generator/ (Skye's work for doing the indexing, work queue indexer)
https://repository.dataone.org/software/cicore/trunk/cn/d1_cn_index_processor/ (Task executor)
Stage CN Upgrade Notes
----------------------
- Remove from RR: orc, unm (DONE, cj)
- Stop processing: orc, unm (DONE, brl)
- Stop Tomcat: orc, unm (DONE, brl)
- pg_dump: unm, ucsb (DONE, cj)
- /var/metacat backup: unm, ucsb, orc (DONE, cj)
- fetch latest debian packages unm, orc (ubuntu-stable)
- install dataone-cn-metacat: unm, orc (DONE, brl)
- Configure upgrade DB from 2.0.5 to 2.0.7 using Metacat admin screen: unm (DONE), orc (DONE)
- [NOTE: remember to restart Tomcat after configuration is complete.]
- Add RR: orc, unm (DONE, cj)
- Remove RR: ucsb (DONE, cj)
- Perform Metacat upgrade steps as above: ucsb (DONE, brl)
- Add RR: ucsb (DONE, cj)
Discussion on Authentication for ONEDrive/Web
---------------------------------------------
Discussion on CI Prioritites
----------------------------
Dependency graphs: http://mule1.dataone.org/ArchitectureDocs-current/implementation/components.html
Components List: https://docs.google.com/spreadsheet/ccc?key=0Ai3ryhJR2IgZdEwwTDhnai01UXN1RlRoUWtkOFNyZVE#gid=0
Affected Components per Project: https://docs.google.com/spreadsheet/ccc?key=0AjusEcfJ75HCdGJKd05NemctcjVoU18yNFZyeHVJdWc#gid=0
- TODO: Change hudson to build d1_portal before d1_mercury_common to ensure the dependency is met (Skye)
- Some libraries don't have the DataONE maven repo listed as a location (Sonatype only)
- TODO: Create dependency graphs for each of the cn-buildout components to simplify the graphs (Dave)
- TODO: deprecate 2 repl member nodes (Chris)