CCI 1.4.0 Release

Planning meeting 2014-07-31
 * attendees: Ben, Dave V., Jing, Marco., Peter, Robert W., Skye
 * redmine story: https://redmine.dataone.org/issues/5533

Items to be included in this release:
 * proposed items for 1.4 release from CCIT meeting in SB (notes: http://epad.dataone.org/20140624CCIT-NCEAS):
   * Separate LogAggregation from d1 processing to a separate VM and remove HZ from it without touching the rest of the CN
     *  (week of July 14 and 21 and 28)
     * Consider use SolrCloud for replication
       * need to have a mockup of this and testit with large amount of documents to check throughput rate
       * might be minor code changes
       * already have zookeeper component that was initially used for cn auditing
       * what is the strategy for indexing?
         * update a solr core and swap?
     * log agg needs to write to solrcloud endpoint, not local
     * authentication may be an issue
       * certificate manager is a singleton and may be an issue when running in Jetty
     * have to separate Hazelcast internal usage from WAN functionality
   * Minor OneMercury help text, UI updates (from Heather Heinz testing) (2 days of dev and testing effort) (1.4) - complete
   * think about using MetacatUI search in parallel with OneMercury
     * Authentication not tsted yet, so would provide a public only search capability for immedate future
     * ask Lauren to document how much work to use MetacatUI to view solrIndex geolocation info on the CN
       * not simply a matter of pointing to CN solr index
       * metacat relies on a limited view of metadata that is available from solr index
         * possible construct resolve URLs 
   * Release replication auditing (work complete just needs scheduled into release) (1.4)
   * Move to new maven repository in poms
     * Rob, Mark reviewing Archiva setup on Jenkins
       * does Jenkins, Archiva need to run on separate VM?
       * still in the works
       * Dave, Marco, Rob will work on this next week
         * refer to Redmine tickets: #5996, #5997
   * Move OneMercury to another VM or servlet container  (deferral candidate)
     * Needs some discussion (2-3 days of effort?)
     * authentication is the problem
       * user certificate passed to solr, converted to subjects from http request (via URL filter)
     * Requires VM's to be stood up for each environment - dev, sandbox, stage, production.
     * want to be able to make updates to UI without affecting CN
     * 
Individual issues from redmine cci 1.4 milestone
 * https://redmine.dataone.org/projects/d1/issues?query_id=64 (Milestone 1.4 query in redmine sidebar)
 * for each of these issues - should they be included in the CCI 1.4.0 release?
 * These issues will be reviewed and any comments entered into redmine for each issue
   *  Story https://redmine.dataone.org/issues/2246: refactor libclient_java connection management
   *  Story https://redmine.dataone.org/issues/824: CN replica auditing
   *  Story https://redmine.dataone.org/issues/5510: Corrections and suggestions for ONEMercury UI 
   * Bug https://redmine.dataone.org/issues/3998: Log Aggregation is not handling equivalent identities
   * Task https://redmine.dataone.org/issues/5331: Remove Hazelcast dependency from Log Aggregation processing
   * Story https://redmine.dataone.org/issues/5575: Migrate maven to a new repository location
   * Bug https://redmine.dataone.org/issues/5578: Metacat compilation should target Java 1.6
   * Task https://redmine.dataone.org/issues/5993: Do not attempt to replicate invalid documents 
   * Task https://redmine.dataone.org/issues/5994: Create facility for tracking replication attempts 
   * Story https://redmine.dataone.org/issues/3910: Modify Synchronization to apply more validation Logic
   * Bug https://redmine.dataone.org/issues/6012: d1_libclient_java printing 'checkServerTrusted - RSA' 
   * Bug https://redmine.dataone.org/issues/5743: d1_processing replication fails at startup on Production
   * Bug https://redmine.dataone.org/issues/5742: production oa4mp_client.xml in metacat contains wrong key
   * Story https://redmine.dataone.org/issues/2192: Implement NodeReplicationPolicy in Node Registry Service
   * 
 * Design considerations
   * Log Aggregation
     * for CCI 1.4, log agg will be modified to run on active/passive CN configuration (single active CN, multiple (2?) passive CNs)
       * harvesting of MN event records will be indexed initially on only the active CN
         * i.e. the processing of adding sysmeta fields, IP address to location fields, etc will not be distributed among all CNS, but will be performed on the active CN
       * synchronization of event indexes between CNs
         * content will always be initially inserted, updated on the active CN event index
         * content will be synchronized to the passive CNs after the active CN is updated
       * decouple log agg from the CN processing
 * Implementation 
   * Log Aggregation
     * for CCI 1.4, log agg will continue to use Hazelcast to receive notification of modifications to sysmeta via the SystemMetadataLister.java class
       * this is needed to update event records for a pid that has had mutable sysmeta content changed
         * most important info for log agg is a pid's access policy - changes in access policy can effect a user's access to event index documents
       * harvested MN event log records will no longer be submitted to the HZ processing cluster to distribute processing/indexing
     * decouple log agg from CN processing
       * remove log agg from d1-processing and place in it's own service: d1-event-index
     * synchronization
       * SolrCloud will be used to syncronize event indexes between all CNs
         * SolrCloud requires Solr 4
         * a Zookeeper ensemble will be used to sync the 3 Solr instances
           * each CN will run a standalone Zookeeper
             * hostids that comprise the ensemble are specified in the ./conf/zoo.cfg file:
           * Zookeeper must be started before Solr
       * in log agg source, have to disable manual synchronization
         * LogAggregationRecoveryJob will be removed 
           * in CCI 1.3 and before, this was used to sync a CN that was just starting
             * starting CN synced to CN with most current event index entries
         * LogAggregationSyncJob will not be implemented (was checking into d1 souce trunk)
     * solr 4 install/configuration
       * solr will be started as a Jetty context
         * where does the solr.war file go?
         * what config files are needed for this context to load?
     * use jetty as servlet container for solr
       * since we are using Solr 4 and the object index is using Solr3, we must move log agg to another servlet container
       * apache web server routing
         * use ajp connector to route logsolr requests from apache to jetty/solr 4
           * http://wiki.eclipse.org/Jetty/Howto/Configure_AJP13
         * use mod_proxy instead of ajp13?
       * URL rewritting
         * in cci 1.3 apache rewrites URLs to route requests to solr event index
           * currently implemented via mod_rewrite 
             * config file: /etc/apache2/conf.d/rewrite_cnlog
       * in tomcat7, solr context is automatically loaded via config file:
         * /var/lib/tomcat7/conf/Catalina/localhost/d1-cn-log.xml
           * solr context loaded via
             * /var/lib/tomcat7/conf/Catalina/localhost/solr.xml
     * setup URL filtering with jetty
       * authenticated access to the event index is enabled via URL filtering
         * in tomcat, this is enabled from 
           * /usr/share/solr/WEB-INF/web.xml 
     * SSL authentication
       * does an addition instance of CertificateManager need to run w/Jetty?
     * new CN components that will be needed for log agg, event index
       * dataone-event-index
         * installs log agg processing software
         * starts the Spring application to run log agg MN harvesting/indexing
         * installs, log agg configuration in jetty, zookeeper, solr4
           * configures /var/lib/zookeeper/conf/zoo.cfg
           * configures /var/solr/
     * existing CN components (not currently used in cci 1.3, but have been in source tree) that will be used
       * dataone-solr
         * installs solr 4 to /var/solr
         * this component will be updated to Solr 4.9.0
         * solr4 needed for solrcloud feature used by log agg cci 1.4
         * solr3 will continue to be used for object index
       * dataone-jetty
         * jetty has to run on CNs that are also running tomcat
           * jetty will use default port 8983
           * should this component be updated to the current version of jetty (9.?)?
             * currently installs v 8.1.11
           * this component will be upgraded to jetty 9.2.2
       * dataone-solr-jetty
         * this component will not be used
       * dataone-zookeeper
         * currently installs v 3.4.5
           * this will be upgraded to v 3.4.6
             * The release fixes a critical bug that could prevent a server from joining an established ensemble
     * cn/d1_solr_extensions
       * this component produces a jar file that contains classes needed for auth checks for logsolr queries
       * these checks are activated by URL filters defined in /usr/share/solr/WEB-INF/web.xml
Log aggregation configuration
 * /etc/dataone/node.properties
   * set property 'cn.nodeId.active=<nodeId>' so that <nodeId> is the node identifier of the 'active' CN (in the single master CN configuration), e.g. cn.nodeId.active=urn:node:cnDevUCSB1