Clusters FAQs

There is remote access to some of the Research Computing clusters. Below we outline the different methods for remote access:

COPPERHEAD
You can use ssh to log in to the interactive/submit host, hpc.uncc.edu.

SIDEWINDER
Once logged into Copperhead, you can “ssh sidewinder” to access the Sidewinder cluster.

Yes, we always welcome feedback. Please send your suggestions, comments, etc. via email to any member of the URC Staff.

Yes you can. The URC cluster currently supports parallel processing using the MPI standard. These jobs are also submitted via the Condor job scheduler.

Parallel processing is supported via two separate Condor universes. The older MPI universe supports the mpich implementation of the Message Passing Interface (MPI) standard while the newer Parallel universe provides a flexible mechanism that can be adapted to almost any parallel execution methodology.

Parallel Execution in the MPI Universe

If you wish to use the MPI universe to run your own mpi code, you need to include the mpich libraries (version 1.2.4 only) when compiling/linking your program. One way to do this is to use the mpich compiler commands mpicc, mpiCC, mpif77, or mpif90 provided in /apps/mpich/bin. These commands are simply scripts that run the gnu C or C++ compiler or the Portland Group Fortran compiler with the recommended libraries and command line options. Alternatively, you can simply reference the relevant include files and libraries (found in /apps/mpich/include and /apps/mpich/lib respectively) in your project makefile or on the command line.

Many open source or commercial applications also include support for parallel processing. In order for the applications to work properly in the MPI universe, they must support mpich version 1.2.4. Applications that do not support mpich 1.2.4 should be run via the parallel universe.

Once you have a properly compiled mpich executable, the process to submit a job to Condor is as described above

condor_submit -n mpi <submit_file>

where the submit file would be as shown above with the Universe set to MPI and an additional line to specify the desired number of processors, e.g.:

Machine_count = 4

Choosing an optimal value for the number of processors often requires some experimentation since each application is likely to scale differently depending on a variety of factors like the ratio of computation to I/O. Very few applications scale well beyond 16 or 32 processors.

The additional qualifier on the condor_submit command (-n mpi) is necessary to direct the job to the correct condor scheduling host. Not all universes are managed by the same condor server.

Parellel Execution in the Parallel Universe

Parallel execution in the “parallel” universe is extremely flexible in terms of the types of parallel operation that are possible. The user specifies a wrapper script in addition to an executable. The wrapper is executed on each node that is assigned to the user’s job and is passed the executable as an argument. This allows the user to construct a custom parallel environment within their assigned nodes, and then start the real executable in a fashion that is appropriate to their custom environment.

For users of the old mpi queue, the changes required to switch to the parallel queue are minimal. In the condor_submit file change these lines:

universe = mpi
executable = my.exe
arguments = -arg1 -arg2

with these lines:

universe = parallel
executable = /apps/condor/scripts/mpi-1.2.4
arguments = my.exe -arg1 -arg2

The universe changes from mpi to parallel, the user’s executable becomes the first argument, and the executable is a standard wrapper script that is installed as part of the condor software package.

The resulting submit file is then submitted using:

condor_submit -n parallel <submit file name>

In this example the user’s job will run using version 1.2.4 of mpich just as it does in the mpi queue. For users who wish to use a later version of mpich, a separate wrapper script, mpi-1.2.5, is also available under /apps/condor/scripts. Additional, wrapper scripts will be added whenever new versions of mpich are installed.

Additional wrapper scripts have been developed to support ABAQUS and Matlab DCE, and will be developed as need to support other applications (e.g. LAM MPI).

If you would like to utilize the NVIDIA GPUs on the cluster for your compute job, below are some tips to help your job do so.

  • Make sure you ask the scheduler for a GPU in your job request (submit script). You append the GPU request on the #PBS directive in which you ask for CPUs, for example:
#PBS -l nodes=1:ppn=1:gpus=1,mem=16GB

 

  • Unless your code has built-in GPU support (for example, Matlab), you may want to load one of the available CUDA Toolkit modules; currently we offer 3: cuda/7.5, cuda/8.0, or cuda/9.0. You can load one of the 3 available by adding a “module load…” line to your submit script. You can also issue a “module list” command to display what modules are currently loaded. The CUDA binaries (like nvcc) and libraries should now be available to your compute job:
module load cuda/8.0

module list
Currently Loaded Modulefiles:
  1) pymods/2.7.5    2) perlmods/5.16.3   3) cuda/8.0

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

 

  • If your code depends on The NVIDIA CUDA Deep Neural Network (cuDNN) GPU-accelerated library, you must load an available cuDNN module to set up your $LD_LIBRARY_PATH. There are several cudnn modules to choose from, depending on what cudnn version *and* what CUDA Toolkit version you require. Please use the command “module avail cudnn” to see what’s available.
module load cudnn/6.0-cuda8

module list
Currently Loaded Modulefiles:
  1) pymods/2.7.5    2) perlmods/5.16.3   3) cuda/8.0   4) cudnn/6.0-cuda8

 

  • If you would like to target a specific model of GPU, you can add a "feature" tag to your request. For example the following directive requests one node with one traditional computing core and one GTX-1080Ti GPU. There is also a "k80" tag for requesting one of the existing Telsa K80 GPUs. The following directive requests one node with one traditional computing core and one K80 GPU:
### If you prefer an NVIDIA Tesla GTX-1080ti, specify the "gtx1080ti" feature tag:
#PBS -l nodes=1:ppn=1:gpus=1:gtx1080ti

### If you prefer an NVIDIA Tesla K80, specify the "k80" feature tag:
#PBS -l nodes=1:ppn=1:gpus=1:k80

The Copperhead Cluster can be accessed via SSH to “hpc.uncc.edu.”  This will connect the user to one of the Interactive/Submit nodes located in the Copperhead Cluster pool.   From those nodes, a user can submit jobs requesting the following resources:

General Compute Nodes (63 x 16 cores ea + 2 x 32 cores ea = 1072 CPU cores)

GPU Compute Nodes (4 x 2 NVIDIA K80 GPUs and 16 cores ea = 8 GPUs and 64 CPU cores)

Large Memory Compute Nodes (16 CPU cores and 768GB RAM)

Jobs should always be submitted to the “copperhead” queue unless directed otherwise by URC Support.

Copperhead Defaults

To make more efficient use of the resources, user jobs are now submitted with a set of default resource requests which can be overridden on the qsub command line or in the job submit script via qsub directives.    If not specified by the user, the following defaults are set:

#PBS -l walltime=8:00:00        (Max Job Run time is 8 hours)

#PBS -l mem=2GB                      (Allow up to 2GB of Memory per job)

See the discussions below for more details

 

Nodes= vs Procs=

In the older URC clusters i.e. viper, cobra, python, etc. if a job needed a particular number of processors (procs) and did not care how they were distributed among the compute nodes, the following syntax was allowed:

#PBS -l nodes=16

This would reserve the first 16 procs available regardless of where they were located. On Copperhead, this would actually cause the job to attempt to reserve 16 nodes with 1 proc each which may not be desirable.  To make more efficient use of resources and for clarification of the request, this syntax is no longer valid.   Instead use the following:

If you really want X procs on Y nodes:

#PBS -l nodes=Y:ppn=X

If you just want X procs and don’t care which or how many nodes they are on, use:

#PBS -l procs=X

This will allow the scheduler to better utilize available resources and may help your job get scheduled quicker.

 

Walltime

This determines the actual amount of time a job will be allowed to run. If not specified by the user, the default value is now 8 hours.  This value (or less) automatically labels a job as being a “short” job and therefore will have more potential nodes on which it can run.   Jobs requiring longer than 8 hours are considered “long” jobs and are restricted as to potential nodes they can run on.  In most cases, the longer the requested walltime, the lower the priority a job will have in competition with other, shorter, jobs.

Example:  #PBS -l walltime=8:00:00      # 8 hours, 0 minutes, 0 seconds

 

Memory (mem)

mem – Is the amount of memory to be allocated to a job.  If not set by the user, it defaults to roughly 2GB per core requested.   “mem”  applies to the entire job and is therefore a separate resource request which can be specified either as part of the nodes specification (separated by a  comma)

Example:     #PBS -l nodes=2:ppn=2,mem=8GB

or as a separate directive

Example:     #PBS -l nodes=2:ppn=2

                      #PBS -l mem=8GB

These two examples are equivalent and request a total of  8GBs of memory spread across 4 cores on 2 nodes.

Jobs whose memory requirements will not allow it to fit on a regular compute node will automatically be scheduled on a large memory compute node.  Regular compute nodes have 128 GBs of memory and 16 cores.

Jobs that exceed the requested memory will be terminated by the scheduler.

 

GPUs

GPUs are requested like procs.   Currently there are 4 Copperhead nodes each of which contains 2 addressable GPUS. Note that gpus= is part of the node’s properties, therefore you use a colon to separate it (like :ppn=), as opposed to separate resource request (like ,mem=), which uses a comma to separate.

Example:     #PBS -l nodes=1:ppn=1:gpus=1                 #  (1 node with 1 cpu and 1 gpu)

 

Note that you cannot use the following on Copperhead:

#PBS -l gpus=N

While Torque will accept this by defaulting to nodes=1, this becomes confusing if you attempt to ask for gpus=3 on Copperhead since there is a maximum of 2 gpus per node.   If you need more than 2 gpus, for a job, the following are examples of valid requests:

#PBS -l nodes=3:ppn=1:gpus=1               # 3 Nodes, 1 gpu each (3 GPUs)

#PBS -l nodes=2:ppn=1:gpus=2               # 2 Nodes, 2 gpus each (4 GPUs)