= Moving forward with BRC000 = 2010-05-07 Present: Dave, Greg, Aimee, Dave, CJ, Jim Background: Two drives have failed on the BRC000 cluster frontend. This means the root filesystem for the system is unrecoverable, and so the OS needs to be re-installed. This discussion is to verify the applications that need to be run on the cluster and based on that information select an appropriate action for resoroting the system. == Applications being run on the cluster == * LM - mysql, python, cherrypy, apache, openmodeler, lots of spatial libs * Sun grid engine == Options for moving forward == 1. Do nothing - decommission the brc000 cluster and utilize existing resources 2. Patch the brc000 cluster, bringing it back to basically the same configuration as before 3. Adapt existing / soon to be purchased hardware to add sufficient capabilities to match the needs of the various projects 4. Purchase new hardware (approximately $40-60k for equal or better capabilities, e.g. a couple PowerEdge R900's with 4x6 core processors, 128gb ram, 1tb drive space) == Resolution == * Rebuild with latest version of Rocks. * Also add some new nodes - cost estimate * Buy some new drives - 5x147gb for FE, Fujistu - drives for compute nodes? * Install OS - download and burn cds - install FE - attach drive array - install compute nodes. May require setting bios / different drive configurations on each node. - setup user accounts (use NHM credentials ?) * Install LM software dependencies - new MySQL - python >= 2.6 - spatial libs - openmodeller - etc