Member Node Wranglers 
    Fridays at 10:00 am AK
                    11:00 AM PDT
                    12:00 MDT
                    1:00pm CDT
                    2:00pm EDT) 

https://www1.gotomeeting.com/join/430697153
                                                                                                                    
13 December 2013  

Attending:  Laura, Bruce, John, Chris, Rebecca

Regrets:  Dave, Amber


Agenda: 

        1. High profile issues (or current items of interest)
        
        CN synch issue - hope to see resolution next week (week of 16 Dec); Robert and Skye busy writing code, hope to be ready for testing middle of next week; it's not a lot of code but it is very detailed; if we are ready to go then, we'll be monitoring testing over the break - if not, it will be after the beginning of the year
        Security breach on one of the virtual machines at UCSB; moving to key based authentication for the future (lesson learned: don't use the same username/password combo everywhere).  This issue should be cleared up by Monday, then return to CN stuff.
    
        A somewhat difficult question: when do we expect a synch issue like this to recur? I.E. how common is this type of event? (ex post facto statistics say ~ once every 5 years ...)
        Can we use this episode to update the Project risk register? Is likelihood still low? Can we confirm that the impact was as assessed? (i.e. we did not lose data, did not lose user  service, impact was some internal effort and some delay for some development - right?  YES, need to update risk register.
            
        Is the CN synch issue related to firewall rules or Hazelcast?  Replication among 3 CNs to replicate science metadata - config in metacat didn't get changed in metacat on all CNs; issue compounded by network split in Hazelcast.  result: science metadata not replicated across all CNs so it is out of synch.
        ORC is the only CN right now; other two will need to be synched with ORC (union) and then "put back into production".  
        Access policies and replication policies seem to be the most important things impacted (it's inconsistent across CNs)
        
        Risk register:  risk - likelihood - impact (high/low) for each risk factor
            Hazelcast has bitten us twice now; as a 3rd party product, we want to look at another solution that works better without the potential problems
            Hazelcast pro (availability); others' pro (consistency)
            Add a Mitigation strategy: [May need better rewording by someone directly familiar, such as Chris]
            1) DataONE is writing a auditing script(s) to help confirm merging and intergity of system metadata about CN's
            2) DataONE is writing a monitoring harness to identify more quickly  if/when this re-occurs.
            3) Revisit examiniation of possible hazelcast equivalents
        
        Another issue:  when we take a CN down to update, need to say the other (authoritative) CN is read-only temporarily while the update is happening.
        
        New MNs announcements? -  holding off until after CN issues resolved
            What makes a MN officially in production?  For example, GOA has data out there but we show them as still in testing in redmine.  Internal communication regarding the steps to put a MN in production needs to be improved (next thing after the CN fix).  Need a step where we evaluate the metadata itself.
            
        IDCC workshop 2/27/14? - email out to us 12/12/13 to identify questions to be answered in preparation for the workshop; 
            we will use Matt's previous workshop's information, 
            need to sort out who will be going to the workshop (Bruce, Amber, Dave (unconfirmed), Chris, maybe another developer?)  Note to coredev asking if any of them want to come.
            
        Isis (developer at UNM) would like to get together to talk about dashboard next week.  
        
        2. Status of MNs  (leftover from 12/6 until updated)
        3. Old action items
        
        4. Not-high profile issues
         
        5. Around the room