Things to keep in mind:
- Jobs should always be submitted to the “copperhead” queue unless directed otherwise by URC Support
- Users can have a max of 256 CPU cores active at any given time
- If a user submits several jobs that totals >256 CPU cores across all jobs, only a max of 256 cores will become active, while the remaining jobs stay queued.. But once the active jobs exit and free up enough cores the scheduler will release the queued jobs until the 256 user core limit is reached once again.
- If a single job requests >256 CPU cores, it will never run.
- Users may run interactively on hpc.uncc.edu to perform tasks such as transferring data* using SCP or SFTP, code development, and executing short test runs up to about 10 CPU minutes. Tests that exceed 10 CPU minutes should be run as scheduled jobs.
- When using MobaXterm to connect, do not use the "Start local terminal" option. Instead, create and save a new session fo HPC and connect via the left menu. The "Start local terminal" option will prevent the Duo prompt from displaying and will result in continuous prompting for the password.
* For transferring larger amounts of data, please take a look at URC's Data Transfer Node offering.
To make more efficient use of the resources, user jobs are now submitted with a set of default resource requests which can be overridden on the qsub command line or in the job submit script via qsub directives. If not specified by the user, the following defaults are set:
#PBS -l walltime=8:00:00 # (Max Job Run time is 8 hours)
#PBS -l pmem=2GB # (Allow up to 2GB of Memory per CPU core requested)
See the discussions below for more details
Nodes= vs Procs=
In the older URC clusters i.e. viper, cobra, python, etc. if a job needed a particular number of processors (procs) and did not care how they were distributed among the compute nodes, the following syntax was allowed:
#PBS -l nodes=16
This would reserve the first 16 procs available regardless of where they were located. On Copperhead, this would actually cause the job to attempt to reserve 16 nodes with 1 proc each which may not be desirable. To make more efficient use of resources and for clarification of the request, this syntax is no longer valid. Instead use the following:
If you really want X procs on Y nodes:
#PBS -l nodes=Y:ppn=X
If you just want X procs and don’t care which or how many nodes they are on, use:
#PBS -l procs=X
This will allow the scheduler to better utilize available resources and may help your job get scheduled quicker.
This determines the actual amount of time a job will be allowed to run. If not specified by the user, the default value is now 8 hours. This value (or less) automatically labels a job as being a “short” job and therefore will have more potential nodes on which it can run. Jobs requiring longer than 8 hours are considered “long” jobs and are restricted as to potential nodes they can run on. In most cases, the longer the requested walltime, the lower the priority a job will have in competition with other, shorter, jobs. Example:
#PBS -l walltime=8:00:00 # 8 hours, 0 minutes, 0 seconds
mem – Is the amount of memory to be allocated to a job. If not set by the user, it defaults to roughly 2GB per core requested. “mem” applies to the entire job and is therefore a separate resource request which can be specified either as part of the nodes specification (separated by a comma). Example:
#PBS -l nodes=2:ppn=2,mem=8GB
or as a separate directive:
#PBS -l nodes=2:ppn=2
#PBS -l mem=8GB
These two examples are equivalent and request a total of 8GBs of memory spread across 4 cores on 2 nodes.
Jobs whose memory requirements will not allow it to fit on a regular compute node will automatically be scheduled on a large memory compute node. Regular compute nodes have 128 GBs of memory and 16 cores.
Jobs that exceed the requested memory will be terminated by the scheduler.
GPUs are requested like procs. Currently there are 4 Copperhead nodes that each contain 2 addressable NVIDIA K80 GPUs, and 2 Copperhead nodes that each contain 8 addressable NVIDIA GeForce GTX1080ti GPUs. Note that gpus= is part of the node’s properties, therefore you use a colon to separate it (like :ppn=), as opposed to separate resource request (like ,mem=), which uses a comma to separate. Example:
#PBS -l nodes=1:ppn=1:gpus=1 # (1 node with 1 cpu and 1 gpu)
K80 vs GTX1080ti GPUs
If you would like to specify a particular type of GPU, you have 2 to choose from: the NVIDIA K80 or GTX1080ti. If your job requires a GTX1080ti, you can ask the scheduler for one like so:
#PBS -l nodes=1:ppn=1:gpus=1:gtx1080ti # (1 node with 1 cpu and 1 gtx1080ti gpu)
Likewise, if you would rather process on an NVIDIA K80, please specify like so:
#PBS -l nodes=1:ppn=1:gpus=1:k80 # (1 node with 1 cpu and 1 k80 gpu)