COPPERHEAD User Notes

The Copperhead Cluster can be accessed via SSH to “hpc.uncc.edu.”  This will connect the user to one of the Interactive/Submit nodes located in the Copperhead Cluster pool. From those nodes, a user can submit jobs requesting the following resources:

General Compute Nodes

  • 63 nodes x 16 cores = 1008 CPU cores
  • 2 nodes x 32 cores = 64 CPU cores
  • 23 nodes x 36 cores = 828 CPU cores

GPU Compute Nodes

  • 4 x 2 NVIDIA K80 GPUs + 16 cores = 8 GPUs and 64 CPU cores
  • 2 x 8 NVIDIA GTX-1080ti GPUs + 8 cores = 16 GPUs and 16 CPU cores

Large Memory Compute Nodes

  • 1 node x 16 CPU cores and 768GB RAM
  • 1 node x 64 CPU cores and 4TB RAM

Copperhead has a total of 2060 CPU cores, and 24 GPUs.

*Things to keep in mind:

  • Jobs should always be submitted to the “copperhead” queue unless directed otherwise by URC Support
  • Users can have a max of 256 CPU cores active at any given time
  • If a user submits several jobs that totals >256 CPU cores across all jobs, only a max of 256 cores will become active, while the remaining jobs stay queued.. But once the active jobs exit and free up enough cores the scheduler will release the queued jobs until the 256 user core limit is reached once again.
  • If a single job requests >256 CPU cores, it will never run

Copperhead Defaults

To make more efficient use of the resources, user jobs are now submitted with a set of default resource requests which can be overridden on the qsub command line or in the job submit script via qsub directives.    If not specified by the user, the following defaults are set:

#PBS -l walltime=8:00:00        # (Max Job Run time is 8 hours)
#PBS -l pmem=2GB
                # (Allow up to 2GB of Memory per CPU core requested)

See the discussions below for more details

Nodes= vs Procs=

In the older URC clusters i.e. viper, cobra, python, etc. if a job needed a particular number of processors (procs) and did not care how they were distributed among the compute nodes, the following syntax was allowed:

#PBS -l nodes=16

This would reserve the first 16 procs available regardless of where they were located. On Copperhead, this would actually cause the job to attempt to reserve 16 nodes with 1 proc each which may not be desirable.  To make more efficient use of resources and for clarification of the request, this syntax is no longer valid.   Instead use the following:

If you really want X procs on Y nodes:

#PBS -l nodes=Y:ppn=X

If you just want X procs and don’t care which or how many nodes they are on, use:

#PBS -l procs=X

This will allow the scheduler to better utilize available resources and may help your job get scheduled quicker.

 

Walltime

This determines the actual amount of time a job will be allowed to run. If not specified by the user, the default value is now 8 hours.  This value (or less) automatically labels a job as being a “short” job and therefore will have more potential nodes on which it can run.   Jobs requiring longer than 8 hours are considered “long” jobs and are restricted as to potential nodes they can run on.  In most cases, the longer the requested walltime, the lower the priority a job will have in competition with other, shorter, jobs. Example:  

#PBS -l walltime=8:00:00      # 8 hours, 0 minutes, 0 seconds

 

Memory (mem)

mem – Is the amount of memory to be allocated to a job.  If not set by the user, it defaults to roughly 2GB per core requested.   “mem”  applies to the entire job and is therefore a separate resource request which can be specified either as part of the nodes specification (separated by a  comma). Example:

#PBS -l nodes=2:ppn=2,mem=8GB

or as a separate directive:

#PBS -l nodes=2:ppn=2
#PBS -l mem=8GB

These two examples are equivalent and request a total of  8GBs of memory spread across 4 cores on 2 nodes.

Jobs whose memory requirements will not allow it to fit on a regular compute node will automatically be scheduled on a large memory compute node.  Regular compute nodes have 128 GBs of memory and 16 cores.

Jobs that exceed the requested memory will be terminated by the scheduler.

 

GPUs

GPUs are requested like procs.   Currently there are 4 Copperhead nodes that each contain 2 addressable NVIDIA K80 GPUs, and 2 Copperhead nodes that each contain 8 addressable NVIDIA GeForce GTX1080ti GPUs. Note that gpus= is part of the node’s properties, therefore you use a colon to separate it (like :ppn=), as opposed to separate resource request (like ,mem=), which uses a comma to separate. Example:

#PBS -l nodes=1:ppn=1:gpus=1       #  (1 node with 1 cpu and 1 gpu)

K80 vs GTX1080ti GPUs

If you would like to specify a particular type of GPU, you have 2 to choose from: the NVIDIA K80 or GTX1080ti. If your job requires a GTX1080ti, you can ask the scheduler for one like so:

#PBS -l nodes=1:ppn=1:gpus=1:gtx1080ti      #  (1 node with 1 cpu and 1 gtx1080ti gpu)

Likewise, if you would rather process on an NVIDIA K80, please specify like so:

#PBS -l nodes=1:ppn=1:gpus=1:k80      #  (1 node with 1 cpu and 1 k80 gpu)