HPC Cluster Outage

Posted on Monday, February 28, 2011 at 4:54 pm

URC Cluster Users:

We have finally scheduled the much needed cooling upgrade for our server room in the Bioinformatics building.  This upgrade will allow us to turn on the remaining portion of the new NIH-funded cluster, Cobra, return the Hadoop cluster to operation, and finish the implementation of a new GPGPU cluster, Python.

Unfortunately, the installation of this additional cooling will cause some disruption of our operations during the next two weeks while the contractors establish new connections to the building chilled water system.  The current schedule of outages is as follows:

  • Viper will be down beginning at 7am on Monday, March 7th and should return to service by noon on Tuesday, March 8th.
  • Cobra will be down beginning at 7am on Friday, March 4th and should return to service by the end of the day on Wednesday, March 9th.
  • MEES and MEES10 will be unaffected since they are located in the Atkins Building.

Just FYI, the Cobra cluster requires a much longer downtime since it is physically blocking the installation of one of the new InRow coolers.  So we must take it down early to make room for the work to proceed, and we must condense the equipment from 3 computing racks down to 2 before bringing it back up.

All running jobs will be terminated prior to shutdown of each cluster, so please plan your work so that your jobs finish prior to the outage or can be restarted after the equipment is back online.

Finally, the schedule above is based on estimates from the contractors who will perform the work.  There is the possibility that unforeseen problems will be encountered that alter this schedule. In that event, I will send an updated scheduled as early as possible.

We apologize for any inconvenience that this may cause.

Charles Price