Applications

If you would like to utilize the NVIDIA GPUs on the cluster for your compute job, below are some tips to help your job do so.

  • Make sure you ask the scheduler for a GPU in your job request (submit script). You append the GPU request on the #PBS directive in which you ask for CPUs, for example:
#PBS -l nodes=1:ppn=1:gpus=1,mem=16GB

 

  • Unless your code has built-in GPU support (for example, Matlab), you may want to load one of the available CUDA Toolkit modules; currently we offer 3: cuda/7.5, cuda/8.0, or cuda/9.0. You can load one of the 3 available by adding a “module load…” line to your submit script. You can also issue a “module list” command to display what modules are currently loaded. The CUDA binaries (like nvcc) and libraries should now be available to your compute job:
module load cuda/8.0

module list
Currently Loaded Modulefiles:
  1) pymods/2.7.5    2) perlmods/5.16.3   3) cuda/8.0

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

 

  • If your code depends on The NVIDIA CUDA Deep Neural Network (cuDNN) GPU-accelerated library, you must load an available cuDNN module to set up your $LD_LIBRARY_PATH. There are several cudnn modules to choose from, depending on what cudnn version *and* what CUDA Toolkit version you require. Please use the command “module avail cudnn” to see what’s available.
module load cudnn/6.0-cuda8

module list
Currently Loaded Modulefiles:
  1) pymods/2.7.5    2) perlmods/5.16.3   3) cuda/8.0   4) cudnn/6.0-cuda8

 

  • If you would like to target a specific model of GPU, you can add a "feature" tag to your request. For example the following directive requests one node with one traditional computing core and one GTX-1080Ti GPU. There is also a "k80" tag for requesting one of the existing Telsa K80 GPUs. The following directive requests one node with one traditional computing core and one K80 GPU:
### If you prefer an NVIDIA Tesla GTX-1080ti, specify the "gtx1080ti" feature tag:
#PBS -l nodes=1:ppn=1:gpus=1:gtx1080ti

### If you prefer an NVIDIA Tesla K80, specify the "k80" feature tag:
#PBS -l nodes=1:ppn=1:gpus=1:k80

What are Environment Modules?

The environment modules package is a tool that allows you to quickly and easily modify your shell environment to access different software packages. Research Computing offers a large (and growing) number of software packages to users, and each package may contain several tools, manual pages and libraries, or it may require special setup to work properly. Some software packages come in several versions or flavors, many of which conflict with each other. Modules allows you to tailor your shell to access exactly the packages you need by setting up the relevant environment variables for you, and automatically avoiding many possible conflicts between packages.

Command Summary

module avail List available modules
module load Load module named
module unload Unload module named
module whatis Give description of module
module list List modules that are loaded in your environment
module purge Unload all currently loaded modules from your environment
module display Give the rules for module

 

Example Usage

$ module avail

------------------------- /usr/share/Modules/modulefiles ----------------------------
dot module-git module-info modules null use.own

---------------------- /apps/usr/modules/compilers ----------------------------------
bazel/0.3.0          intel/14.0.3          pymods/2.7.12 scala/2.10.4(default)
bazel/0.4.5(default) intel/16.0.0(default) pymods/2.7.5  scala/2.11.7
gcc/4.9.3            perlmods/5.16.3       pypy/5.3.1    yasm/1.3.0
gcc/5.3.0(default)   pgi/14.4              python/2.7.12
ghc/7.10.3           pgi/15.9(default)     python/3.5.1

------------------------------- /apps/usr/modules/lib -------------------------------
clblas/1.10           hdf5/1.8.16-pgi          netcdf/4.4.0(default)
glew/1.13.0           hdf5/1.8.16-pgi-mpi      netcdf/4.4.0-intel
google-code/2015      htslib/1.4.1             netcdf/4.4.0-intel-mpi
hdf/4.2.11(default)   libgpuarray/0.9997       netcdf/4.4.0-mpi
hdf/4.2.11-intel      netcdf/4.3.3.1           netcdf/4.4.0-pgi
hdf/4.2.11-pgi        netcdf/4.3.3.1-intel     netcdf/4.4.0-pgi-mpi
hdf5/1.8.16(default)  netcdf/4.3.3.1-intel-mpi openblas/0.2.18
hdf5/1.8.16-intel     netcdf/4.3.3.1-mpi       openblas/0.2.18-nehalem
hdf5/1.8.16-intel-mpi netcdf/4.3.3.1-pgi       trilinos/11.14.1-mpi
hdf5/1.8.16-mpi       netcdf/4.3.3.1-pgi-mpi   trilinos/12.6.4-mpi(default)

------------------------------- /apps/usr/modules/mpi -------------------------------
openmpi/1.10.0(default) openmpi/1.8.1            openmpi/1.8.1-pgi    openmpi/2.1.1-intel
openmpi/1.10.0-ib       openmpi/1.8.1-ib         openmpi/1.8.1-pgi-ib openmpi/2.1.1-intel-ib
openmpi/1.10.0-intel    openmpi/1.8.1-intel      openmpi/2.1.1        platform-mpi/9.01
openmpi/1.10.0-intel-ib openmpi/1.8.1-intel14    openmpi/2.1.1-gcc53
openmpi/1.10.0-pgi      openmpi/1.8.1-intel14-ib openmpi/2.1.1-gcc53-ib
openmpi/1.10.0-pgi-ib   openmpi/1.8.1-intel-ib   openmpi/2.1.1-ib

------------------------------ /apps/usr/modules/apps -------------------------------
abaqus/2017            gromacs/5.1.2(default)      poy/5.1.2-ib(default)
abaqus/6.10-2          gromacs/5.1.2-avx2          qiime/1.9.1
abaqus/6.13-4(default) gromacs/5.1.2-cuda          quickflash/1.0.0
abyss/1.9.0            gromacs/5.1.2-cuda-avx2     quickflash/1.0.0-ib
abyss/1.9.0-ib         gromacs/5.1.2-mpi           R/3.1.1
allpathslg/52488       gromacs/5.1.2-mpi-avx2      R/3.2.3(default)
ansa/13.1.3            gromacs/5.1.2-mpi-cuda      R/3.3.1
art/03.19.15           gromacs/5.1.2-mpi-cuda-avx2 raxml/7.4.2
asciidoc/8.6.9         gromacs/5.1.2-mpi-ib        raxml/7.4.2-mpi
augustus/3.2.3         gromacs/5.1.2-mpi-ib-avx2   raxml/8.2.4(default)
bamtools/2.4.1         itk/4.9.0                   raxml/8.2.4-mpi
bcftools/1.3.1         jellyfish/2.2.6             repdenovo/0.0
bedtools2/2.26.0       lammps/15May15              rosetta/2015.02
bioconductor/3.2       lammps/16Feb16(default)     rosetta/2016.10(default)
 .
 .
 .

$ module avail matlab
--------------------- /apps/usr/modules/apps ----------------------
matlab/R2014a      matlab/R2015b(default)     matlab/R2016b

$ module display matlab/R2015b
-------------------------------------------------------------------
/apps/usr/modules/apps/matlab/R2015b:

module-whatis MATLAB is a high-level language and interactive 
environment for numerical computation, visualization, and programming.
conflict matlab
setenv MATLAB_HOME /apps/pkg/matlab-R2015b
setenv MATLAB_DIR /apps/pkg/matlab-R2015b
prepend-path MATLABPATH /apps/pkg/matlab-R2015b/toolbox_urc/xlwrite
prepend-path CLASSPATH /apps/pkg/matlab-R2015b/toolbox_urc/xlwrite/jxl.jar:/apps/pkg/matlab-R2015b/toolbox_urc/xlwrite/MXL.jar
prepend-path PATH /apps/pkg/matlab-R2015b/bin
prepend-path LD_LIBRARY_PATH /apps/pkg/matlab-R2015b/bin/glnxa64:/apps/pkg/matlab-R2015b/runtime/glnxa64
prepend-path LM_LICENSE_FILE 1700@adm-lic2.uncc.edu
-------------------------------------------------------------------

$ module load matlab/R2015b

$ module list
Currently Loaded Modulefiles:
  1) pymods/2.7.5      2) perlmods/5.16.3   3) matlab/R2015b

 

How the Modules are Organized and Grouped

The modules are organized into “categories”, which include: /apps/usr/modules/mpicompilersapps, and /apps/sys/Modules/3.2.6/modulefiles. Under each category, you will see “groups” of applications: openmpi, intel, pgi, to name a few. Within each group, there may be several versions to choose from. The group and version are separated with a “slash” (/).

Default Modules

You probably noticed some modules listed above are suffixed with a “(default)”. The “default” module is the module that will get loaded if you do not specify a version number. For example, we can load the “intel/16.0.0” module by omitting the version number:

$ module load intel

$ module list
Currently Loaded Modulefiles:
  1) pymods/2.7.5      2) perlmods/5.16.3   3) intel/16.0.0
Note: If you plan to load a version of a module that is not the default, then you must specify the version in the module load command.

Conflicts and Prerequisites

Some modules conflict with others, and some modules are prerequisites of others. Environment Modules handles both scenarios.

The following is an example of trying to load a module that is dependent upon another:

$ module display gromacs/4.6.7-cuda
-------------------------------------------------------------------
/apps/usr/modules/apps/gromacs/4.6.7-cuda:

module-whatis GROMACS is a versatile package to perform molecular dynamics,
i.e. simulate the Newtonian equations of motion for systems with hundreds
to millions of particles. It is primarily designed for biochemical molecules
like proteins, lipids and nucleic acids that have a lot of complicated bonded
interactions, but since GROMACS is extremely fast at calculating the nonbonded
interactions, many groups are also using it for research on non-biological
systems, e.g. polymers.
conflict gromacs
prereq cuda
setenv GROMACS /apps/pkg/gromacs-4.6.7/rhel7_u2-x86_64/gnu-cuda
setenv GMXBIN /apps/pkg/gromacs-4.6.7/rhel7_u2-x86_64/gnu-cuda/bin
setenv GMXLDLIB /apps/pkg/gromacs-4.6.7/rhel7_u2-x86_64/gnu-cuda/lib
setenv GMXDATA /apps/pkg/gromacs-4.6.7/rhel7_u2-x86_64/gnu-cuda/share
setenv GMXMAN /apps/pkg/gromacs-4.6.7/rhel7_u2-x86_64/gnu-cuda/share/man
setenv GMXLIB /apps/pkg/gromacs-4.6.7/rhel7_u2-x86_64/gnu-cuda/share/gromacs/top
prepend-path PATH /apps/pkg/gromacs-4.6.7/rhel7_u2-x86_64/gnu-cuda/bin
prepend-path MANPATH /apps/pkg/gromacs-4.6.7/rhel7_u2-x86_64/gnu-cuda/share/man
prepend-path LD_LIBRARY_PATH /apps/pkg/gromacs-4.6.7/rhel7_u2-x86_64/gnu-cuda/lib
-------------------------------------------------------------------
$ module load gromacs/4.6.7-cuda
gromacs/4.6.7-cuda(12):ERROR:151: Module 'gromacs/4.6.7-cuda' depends on one of the module(s) ''
gromacs/4.6.7-cuda(12):ERROR:102: Tcl command execution failed: prereq cuda

To resolve the above error, simply load the “prereq” module first, then load the original module. First you might want to see what “cuda” modules are available:

$ module avail cuda
---------------------- /apps/usr/modules/gpu ----------------------
cuda/7.5(default)    cuda/8.0

Select one that you would like to load to satisfy Gromac’s requirement. You can do this in a single command:

$ module load cuda/8.0 gromacs/4.6.7-cuda

More information

You can find more information about Environment Modules on SourceForge.net:
http://modules.sourceforge.net/

Introduction

Torque is an Open Source scheduler based on the old PBS scheduler code. The following is a set of directions to assist a user in learning to use Torque to submit jobs to the URC cluster(s).  It is tailored specifically to the URC environment and is by no means comprehensive. 

Details not found in here can be found online at:

http://docs.adaptivecomputing.com/torque/6-0-1/help.htm

Note:
Some of the sample scripts displayed in the text are not complete so that the reader can focus specifically on the item being discussed.  Full, working examples of scripts and commands are provided in the Examples section at the end of this document.

Submitting a Job

To submit a job to the Copperhead cluster, you must first SSH into the Research Computing submit host, hpc.uncc.edu. Scheduling a job in Torque requires creating a file that describes the job (in this case a shell script) and then that file is given as an argument to the Torque command “qsub” to execute the job.

First of all, here is a sample shell script (myjob.sh) describing a simple job to be submitted:

#! /bin/bash

# ==== Main ======
/bin/date

This script simply runs the ‘date’ command.  To submit it to the scheduler for execution, we use the Torque qsub command:

$ qsub -N "MyJob" -q "copperhead" -l procs=1 my_script.sh

This will cause the script (and hence the date command) to be scheduled on the cluster. In this example, the “-N” switch gives the job a name, the “-q” switch is used to route the job to the “copperhead” queue, and the “-l” switch is used to tell Torque (PBS) how many processors your job requests.

Many of the command line options to qsub can also be specified in the shell script itself using Torque (PBS) directives. Using the previous example, our script (my_script.sh) could look like the following:

#!/bin/sh

# ===== PBS OPTIONS =====
### Set the job name
#PBS -N "MyJob"

### Specify queue to run in
#PBS -q "copperhead"

### Specify number of CPUs for job
#PBS -l procs=1

# ==== Main ======
/bin/date

This reduces the number of command line options needed to pass to qsub. Running the command is now simply:

$ qsub my_script.sh

For the entire list of options, see the man page qsub i.e.

$ man qsub

Standard Output and Standard Error
In  Torque, any output that would normally print to stdout or stderr is collected into two files. By default these files are placed in the initial working directory where you submitted the job from and are named:

scriptname.{o}jobid for stdout
scriptname.{e}jobid for stderr

In our previous example (if we did not specify a job name with -n) that would translate to:

My_script.sh.oNNN
My_script.sh.eNNN

Where NNN is the job ID number returned by qsub.  If I named the job with -N (as above) and it was assigned job id 801, the files would be:

MyJob.o801
MyJob.e801

Logs are written to the job’s working directory ($PBS_O_WORKDIR) unless the user specifies otherwise.

Monitoring a Job

Monitoring a Torque job is done primarily using the Torque command “qstat.” For instance, to see a list of available queues:

$ qstat -q

To see the status of a specific queue:

$ qstat "queuename"

To see the full status of a specific job:

$ qstat -f  jobid

where jobid is the unique identifier for the job returned by the qsub command.

Deleting a Job

To delete a Torque job after it has been submitted,  use the qdel command:

$ qdel jobid

where jobid is the unique identifier for the job returned by the qsub command.

Monitoring Compute Nodes

To see the status of the nodes associated with a specific queue, use the torque command pbs_nodes(1) (qlso referred to as qnodes):

$ pbsnodes :queue_name

where  queue_name is the name of the queue  prefixed by a colon (:).  For example:

$ pbsnodes :copperhead

would display information about all of the nodes associated with the “copperhead” queue.  The output includes (for each node) the number of cores available (np= ).  If there are jobs running on the node, each one is listed in the (jobs= ) field.  This shows how many of the available cores are actually in use.

Parallel (MPI) Jobs

Parallel jobs are submitted to Torque in the manner described above except that you must first ask Torque to reserve the number of  processors (cores) you are requesting in your job.  This is accomplished using the -l switch to the qsub command:

For example:

$ qsub  -q copperhead -l procs=16 my_script.sh

would submit my script requesting 16 processors (cores)  from the “copperhead” queue.  The script (my_script.sh) would look something like the following:

#! /bin/bash
module load openmpi
mpirun -hostfile $PBS_NODEFILE  my_mpi_prgram

If you need to specify a specify number of processors (cores) per compute host, you can append a colon (:) to the number of specified nodes and then append the number of processors per host.  For example, to request 16 total processors (cores) with only 4 per compute host, the syntax would be:

$ qsub  -q copperhead -l nodes=4:ppn=4 my_script.sh

As described previously, options to qsub can be  specified directly in the script file.  For the example above, my_script.sh would look similar to the following:

#! /bin/bash

### Set the job name
#PBS -N MyJob

### Run in the queue named "copperhead"
#PBS -q copperhead
### Specify the number of cpus for your job.
#PBS -l nodes=4:ppn=4

### Load OpenMPI environment module.
module load openmpi

### execute mpirun
mpirun my_mpi_prgram

Examples of Torque Submit Scripts

NOTE: Additional sample scripts can be found online in /apps/torque/examples.

[1] Simple Job (1 CPU)

#! /bin/bash

#PBS -N MyJob
#PBS -q copperhead
#PBS -l procs=1

# Run program
/bin/date

[2] Parallel Job – 16 Processors (Using OpenMPI)

#! /bin/bash

#PBS -N MyJob
#PBS -q copperhead
#PBS -l procs=16

### load env for Infiniband OpenMPI
module load openmpi/1.10.0-ib

# Run the program "simplempi" with an argument of "30"
mpirun /users/joe/simplempi 30