= Moving forward with BRC000 =

2010-05-07

Present: Dave, Greg, Aimee, Dave, CJ,  Jim

Background: Two drives have failed on the BRC000 cluster frontend.  This means the root filesystem for the system is unrecoverable, and so the OS needs to be re-installed.  

This discussion is to verify the applications that need to be run on the cluster and based on that information select an appropriate action for resoroting the system.


== Applications being run on the cluster ==

 * LM - mysql, python, cherrypy, apache, openmodeler, lots of spatial libs

 * Sun grid engine


== Options for moving forward ==

 1. Do nothing - decommission the brc000 cluster and utilize existing resources
 
 2. Patch the brc000 cluster, bringing it back to basically the same configuration as before
 
 3. Adapt existing / soon to be purchased hardware to add sufficient capabilities to match the needs of the various projects
 
 4. Purchase new hardware (approximately $40-60k for equal or better capabilities, e.g. a couple PowerEdge R900's with 4x6 core processors, 128gb ram, 1tb drive space)


== Resolution ==

 * Rebuild with latest version of Rocks.

 * Also add some new nodes
   - cost estimate

 * Buy some new drives
   - 5x147gb for FE, Fujistu
   - drives for compute nodes?
  
 * Install OS
   - download and burn cds
   - install FE
   - attach drive array
   - install compute nodes.  May require setting bios / different drive configurations on each node.
   - setup user accounts (use NHM credentials ?)

 * Install LM software dependencies
   - new MySQL
   - python >= 2.6
   - spatial libs
   - openmodeller
   - etc