Development Block 6.2
=====================

Last Block Etherpad: http://epad.dataone.org/2013-44-Block-6-1

G+: https://plus.google.com/hangouts/_/event/cb76pav4nbripd606928j9pjiao?authuser=0&hl=en

Sprint Planning
---------------------

Chris B.
--------

David D.
-------
Rob N.
------
Skye
----
Jing
----
Roger
-----
Dave V
------
Robert W
--------
Ben
---

Total number of object that have replicas
Total number of replicas that have been made by d1 replication service
Total number of replicas requested by replication policies

Update set of Pids from KNB to allow for a max of two replicas in replication policy
Query CNs for counts of replicas vs replication policies
For those replicas in the CN that does not have a replication policy, then is was an out of band replica (different count #)
Query Archival MNs for 

Chris
-----
Matt
----
CN Upgrade discussion

Rollout Procedure for Metacat 2.3.0
-----------------------------------

0. Turn off d1-processing

1. ensure CN metacats have replicated all content between themselves successfully.
- Use Metacat admin ui first
- Restart TC second
- Use CN repair tool
  - runs on the CN, requires a diff'd list of pids as input, plus cn url it is replicating to
  
2.upgrade ucsb and unm

    - upgrade via apt-get upgrade
    - close off hazelcast ports via ufw to orc.
    - re-setup metacat via administrative interface
    - restart tc
    - confirm that both metacat instances have the same sysmeta
        - if they don't restart tc in different order
    - Before d1-processing is started on unm/ucsb
        - FIX replication.properties metacat.password (password is from dev env)
        - FIX process daemon application context wrt to replication beans
            (configure application context for replication as in 1.1.0 tag)
        
3. change RR to point at UCSB (remove orc)

4. upgrade orc
    - upgrade via apt-get upgrade
    - turn on the ports for orc on ucsb and unm
    - upgrade via apt-get upgrade
    - re-setup metacat via administrative interface
    - restart tc
        -confirm that all metacat instances have the same sysmeta
            -if ucsb/unm do not have sysmeta that is located on orc, then use the cn-repair tool to fix hazelcast
5. turn on d1-processing on UCSB


Once upgraded, we have some reharvest tasks for MNs
GOA

20131120
We have to figure out the discrepancy between system metadata *content* across CNs

Strategy for comparing CNs content of system metadata

1) use pgdump to create a snapshot on each CN
2) use pgrestore with configs to recreate all 3 CN tables into a single new db instance
3) For AccessPolicy diff classification:
Diff criteria - determine differences
1.) systemmetadata table 
    Any differences in following column values:
RESOLUTION for systemmetadata table
2.) smreplicationstatus - want to merge all distinct/unique smreplicationstatus records
3.) smreplicationpolicy
PG_DUMP command to run in background
 nohup su postgres -c "/usr/bin/pg_dump -Fc metacat" > metacatDB.dump 2> metacatDB.err < /dev/null &

Reporting
Need to know:
- total number of affected records by PID  
- proportion of total that have been resolved with no loss of information (no conflicts)
- mostly for informational purposes - which fields were affected. Perhaps something like the number of records that were affected for each system metadata property that was found to be inconsistent.

Initial Evaluation of CN AccessPolicy Discrepancies
===================================================

CN-UCSB is missing 1386  Access Control Rules present on CN-ORC
CN-ORC  is missing   52  Access Control Rules present on CN-UCSB
                     
CN-UNM  is missing 1386  Access Control Rules present on CN-ORC
CN-ORC  is missing   30  Access Control Rules present on CN-UNM
                     
CN-UCSB is missing   25  Access Control Rules present on CN-UNM
CN-UNM  is missing    3  Access Control Rules present on CN-UCSB

There are 16 identifiers that may need manual ACL reconciliation across the CNs:
--------------------------------------------------------------------------------

autogen.2013061412245323230.1
autogen.2013061413310702892.1
autogen.2012122012391321421.1
autogen.2012122209301298522.1
autogen.2013110415061851620.1
peggym.109896.1
peggym.109898.1
peggym.109899.1
peggym.109900.1
peggym.109901.1
peggym.109902.1
peggym.109904.1
peggym.109905.1
peggym.109906.1
peggym.109907.1
peggym.109908.1

The inconsistency percentage is approximately 0.2% of the total ACLs in the system
----------------------------------------------------------------------------------

Total UNM ACLs:  685636
Total UCSB ACLs: 685658
Total ORC ACLs:  686992

The affected authoritative Member Nodes for the inconsistent ACLs include
-------------------------------------------------------------------------
CDL
KNB
SANPARKS
GOA
LTER


System metadata difference analysis:
-- pids
        orc has more pids - 823 or so
        ucsb and unm are equal but contain 14 pids not present on orc

-- mod date
        - 67280 changes between orc and ucsb
        - 32068 changes between ucsb and unm
        - 69159 changes between orc and unm
        
-- rights holder
        ucsb and unm are equivalent
        all changes between orc/ucsb/unm are missing records no value changes (823)

-- replication allowed
        ucsb and unm are equivalent
        65302 diffs between orc and ucsb/unm.

-- archived
        between orc and ucsb
                -823 rows on orc not on ucsb
                -19 rows that have changes
        between ucsb and unm (ucsb superset execpt one record)
                -110 rows have changes
                all changes on ucsb except 1 only on unm -- doi:10.5063/AA/nceas.907.2

-- obsoletes
        846 changes between orc and ucsb
        0 changes between ucsb and unm

-- obsoleted_by
        1437 changes between orc and ucsb
        309 changes between ucsb and unm.
        1445 changes between orc and unm.

-- replication status table
        -27476 changes between orc and ucsb
        -31658 changes between ucsb and unm
        -59132 changes between orc and unm




Actions taken on Production CNs 2013/11/23

ON ORC - 160.36.13.150
ufw delete allow to any port 5701 from 129.237.201.155
ufw delete allow to any port 5702 from 129.237.201.155
ufw delete allow to any port 5703 from 129.237.201.155

ufw delete allow to any port 5702 from 128.111.54.80
ufw delete allow to any port 5703 from 128.111.54.80
ufw delete allow to any port 5701 from 128.111.54.80

ufw delete allow to any port 5701 from 64.106.40.6
ufw delete allow to any port 5702 from 64.106.40.6
ufw delete allow to any port 5703 from 64.106.40.6
su postgres -c "psql -d metacat  -c \"UPDATE xml_replication SET replicate = 0, datareplicate = 0 WHERE serverid > 1\""

ON UNM - 64.106.40.6
/etc/init.d/tomcat6 stop
ufw delete allow to any port 5703 from 128.111.54.80
ufw delete allow to any port 5702 from 128.111.54.80
ufw delete allow to any port 5701 from 128.111.54.80

ufw delete allow to any port 5701 from 160.36.13.150
ufw delete allow to any port 5702 from 160.36.13.150
ufw delete allow to any port 5703 from 160.36.13.150

ufw delete allow to any port 5701 from 129.237.201.155
ufw delete allow to any port 5702 from 129.237.201.155
ufw delete allow to any port 5703 from 129.237.201.155
su postgres -c "psql -d metacat  -c \"UPDATE xml_replication SET replicate = 0, datareplicate = 0 WHERE serverid > 1\""

ON UCSB - 128.111.54.80
/etc/init.d/tomcat6 stop
ufw delete allow to any port 5701 from 160.36.13.150
ufw delete allow to any port 5702 from 160.36.13.150
ufw delete allow to any port 5703 from 160.36.13.150

ufw delete allow to any port 5701 from 129.237.201.155
ufw delete allow to any port 5702 from 129.237.201.155
ufw delete allow to any port 5703 from 129.237.201.155

ufw delete allow to any port 5701 from 64.106.40.6
ufw delete allow to any port 5702 from 64.106.40.6
ufw delete allow to any port 5703 from 64.106.40.6
su postgres -c "psql -d metacat  -c \"UPDATE xml_replication SET replicate = 0, datareplicate = 0 WHERE serverid > 1\""
/etc/init.d/tomcat6 start