FAQs

General Questions

Torque provides users the ability to run scripts before and/or after each job executes. With such a script, users can get more information about how their job ran, which node (or nodes) it ran on, how much CPU and RAM it used, etc.

**NOTE** If you are redirecting the logs in your submit script and capturing STDOUT and STDERR, implementing the prologue and epilogue into your submit script is incompatible. Please check if your submit script has the following PBS directives:

#PBS -o /dev/null
#PBS -e /dev/null

and the following (or similar) line in the body:

exec 1>$PBS_O_WORKDIR/$PBS_JOBNAME-$SHORT_JOBID.out 2>$PBS_O_WORKDIR/$PBS_JOBNAME-$SHORT_JOBID.err

If you have the above lines in your submit script, please remove them before continuing with the following steps.

Follow these steps to set up a prologue and epilogue in your PBS submit script:

  1. Copy prologue and epilogue to your home directory
    rsync -avz /apps/torque/logs/ $HOME/torque
  2. Put these additional directives near top of submit script, with the other #PBS directives.
    ( ** CHANGE username to your actual username! ** )

     

    #PBS -l prologue=/users/username/torque/prologue.sh
    #PBS -l epilogue=/users/username/torque/epilogue.sh
  3. Save changes to submit script, then submit it using qsub.

The top of your job’s output file should have some “header” text with information similar to this example:

========================================================================
Start Time: Thu Jul 27 10:34:40 EDT 2017
User/Group: jehalter / jehalter
Job ID: 318469.cph-m1.uncc.edu
Job Name: intake2
Job Details: epilogue=/users/jehalter/torque/epilogue.sh,neednodes=2:
 ppn=6,nodes=2:ppn=6,pmem=2500mb,prologue=/users/jehalter
 /torque/prologue.sh,walltime=00:05:00
Queue: copperhead
Nodes: cph-c25.uncc.edu[6] cph-c26.uncc.edu[6]
========================================================================

Likewise, the bottom of your job’s output file should have a “footer” for information similar to this example:

========================================================================
End Time: Thu Jul 27 10:35:23 EDT 2017
Resources: cput=00:03:06,energy_used=0,mem=1110628kb,vmem=9890428kb
 ,walltime=00:00:43
Exit Value: 0
========================================================================

This information can be valuable to plan future job submissions, for example, if you request 64GB RAM for your job, and you look at the epilogue information that shows it used less than 32GB, then you can adjust your request in future job submissions, which may allow your jobs to be scheduled quicker.

For more information, please visit the “Prologue and Epilogue Scripts” section on Adaptive Computing’s documentation website.

Access to the cluster requires a URC account. Accounts are available on request for UNC Charlotte faculty and for graduate students working on faculty sponsored research projects. Contact any member of the URC Staff if you are interested in requesting a URC account.

In general, the user account name is the same as the official campus login ID (also used for Novell, 49er Express, and Mosaic); however, the account password is unique to the URC cluster and is not synced with any other campus authentication system.

Most of the cluster resources are on a private, internal network. So access to the cluster is via one (or a few) node(s) that also have connections to the campus’ public network. Each of our clusters have a submit host, and some of them have a separate interactive host as well. Please refer to the host list on the “Job Scheduling with Torque” page to determine which host you will need to log into.

The supported methods for connecting to this node are:

  • Secure Shell (SSH) – For secure remote login
  • Secure Copy (SCP) or Secure FTP (SFTP) – For secure data transfer.

The interactive hosts that you can SSH into are:

  • Copperhead: hpc.uncc.edu
  • Taipan (Hadoop): urc-hadoop.uncc.edu

Most common Unix or Linux operating systems (including Mac OS X) come equipped with client versions of these programs for use from the command line. For more information refer to the UNIX man pages for ssh, scp, sftp.

There are also various commercial and shareware versions of these commands available for MS Windows. One popular free client is PuTTY SSH. For securely transferring files to/from Windows, WinSCP is a popular client.

When logged in to one of the interactive nodes, you can download (PULL) data in using one of several different protocols: HTTP(80), HTTPS(443), RSYNC(873), and SCP/SFTP(22).

If you are on your workstation or laptop and you would like to transfer data to/from the cluster , you can do so using one of the various command line or GUI SCP/SFTP clients. We have found that some 3rd party GUI clients do not work well (or at all) with DUO two-factor authentication. But there are some clients that work fine with DUO, some of which are listed below.

WinSCP is a popular SCP/SFTP GUI for Windows. It works well with DUO two-factor auth, using the default settings.

FileZilla, another popular SFTP GUI, is available for Windows, Mac OS X, and Linux. It also works well with DUO two-factor auth, however, you must choose some non-default options (outlined below) in order to have the best experience with your file transfers.

  1. Do not use the Quickconnect option; set up a site in the Site Manager (click "New Site" button)
  2. In the Site Manager, on the General tab of your site, make sure you choose "Interactive" as your Logon Type
  3. In the Site Manager, on the Transfer Settings tab of your site, make sure "Limit number of simultaneous connections" is checked, and "Maximum number of connections" is set to 1
  4. Click connect. You should be prompted for your NinerNet Password (SSH server authentication); enter your NinerNet Password then click OK
  5. You should then be presented with the DUO two-factor method menu. Select your method (i.e., enter the number of your selection in the "Password" field), and click OK
  6. Acknowledge your two-factor method, and you should be logged in.

Cyberduck is a popular SFTP GUI for Windows and Mac OS X. For a smooth experience using DUO two-factor auth, make sure to set File Transfer settings to "Use browser connection" to avoid having to authenticate each time you want to transfer a file. If you are having trouble trying to get your 3rd party SCP/SFTP GUI client to authenticate to our clusters, please contact us.  

NEVER modify the permissions on your /home or /scratch directory.  If you need assistance, please contact us.

If you have received an email from the batch scheduler, stating that you’ve had a “moab job resource violation,” then your job was cancelled because it used more resources than what was requested in your submit script. The most common violation is for using more RAM that your job requested, which will be revealed in the body of the email, like so:

job 1193978 exceeded MEM usage soft limit

If you did NOT specify an amount of RAM in your job’s submit script and you received the “MEM usage soft limit” email, then you elected to accept the default, and the default  amount of RAM was not enough for your job to complete. Currently, for Copperhead compute nodes, the default amount of RAM given to a job is 2GB/core.

If you would like to know how much RAM your compute job is using on the cluster, consider adding the prologue/epilogue into your submit script, which will give you additional job information in your job output log.

Clusters

There is remote access to some of the Research Computing clusters. Below we outline the different methods for remote access:

COPPERHEAD
You can use ssh to log in to the interactive/submit host, hpc.uncc.edu.

SIDEWINDER
Once logged into Copperhead, you can “ssh sidewinder” to access the Sidewinder cluster.

Yes, we always welcome feedback. Please send your suggestions, comments, etc. via email to any member of the URC Staff.

Yes you can. The URC cluster currently supports parallel processing using the MPI standard. These jobs are also submitted via the Condor job scheduler.

Parallel processing is supported via two separate Condor universes. The older MPI universe supports the mpich implementation of the Message Passing Interface (MPI) standard while the newer Parallel universe provides a flexible mechanism that can be adapted to almost any parallel execution methodology.

Parallel Execution in the MPI Universe

If you wish to use the MPI universe to run your own mpi code, you need to include the mpich libraries (version 1.2.4 only) when compiling/linking your program. One way to do this is to use the mpich compiler commands mpicc, mpiCC, mpif77, or mpif90 provided in /apps/mpich/bin. These commands are simply scripts that run the gnu C or C++ compiler or the Portland Group Fortran compiler with the recommended libraries and command line options. Alternatively, you can simply reference the relevant include files and libraries (found in /apps/mpich/include and /apps/mpich/lib respectively) in your project makefile or on the command line.

Many open source or commercial applications also include support for parallel processing. In order for the applications to work properly in the MPI universe, they must support mpich version 1.2.4. Applications that do not support mpich 1.2.4 should be run via the parallel universe.

Once you have a properly compiled mpich executable, the process to submit a job to Condor is as described above

condor_submit -n mpi <submit_file>

where the submit file would be as shown above with the Universe set to MPI and an additional line to specify the desired number of processors, e.g.:

Machine_count = 4

Choosing an optimal value for the number of processors often requires some experimentation since each application is likely to scale differently depending on a variety of factors like the ratio of computation to I/O. Very few applications scale well beyond 16 or 32 processors.

The additional qualifier on the condor_submit command (-n mpi) is necessary to direct the job to the correct condor scheduling host. Not all universes are managed by the same condor server.

Parellel Execution in the Parallel Universe

Parallel execution in the “parallel” universe is extremely flexible in terms of the types of parallel operation that are possible. The user specifies a wrapper script in addition to an executable. The wrapper is executed on each node that is assigned to the user’s job and is passed the executable as an argument. This allows the user to construct a custom parallel environment within their assigned nodes, and then start the real executable in a fashion that is appropriate to their custom environment.

For users of the old mpi queue, the changes required to switch to the parallel queue are minimal. In the condor_submit file change these lines:

universe = mpi
executable = my.exe
arguments = -arg1 -arg2

with these lines:

universe = parallel
executable = /apps/condor/scripts/mpi-1.2.4
arguments = my.exe -arg1 -arg2

The universe changes from mpi to parallel, the user’s executable becomes the first argument, and the executable is a standard wrapper script that is installed as part of the condor software package.

The resulting submit file is then submitted using:

condor_submit -n parallel <submit file name>

In this example the user’s job will run using version 1.2.4 of mpich just as it does in the mpi queue. For users who wish to use a later version of mpich, a separate wrapper script, mpi-1.2.5, is also available under /apps/condor/scripts. Additional, wrapper scripts will be added whenever new versions of mpich are installed.

Additional wrapper scripts have been developed to support ABAQUS and Matlab DCE, and will be developed as need to support other applications (e.g. LAM MPI).

If you would like to utilize the NVIDIA GPUs on the cluster for your compute job, below are some tips to help your job do so.

  • Make sure you ask the scheduler for a GPU in your job request (submit script). You append the GPU request on the #PBS directive in which you ask for CPUs, for example:
#PBS -l nodes=1:ppn=1:gpus=1,mem=16GB

 

  • Unless your code has built-in GPU support (for example, Matlab), you may want to load one of the available CUDA Toolkit modules; currently we offer 3: cuda/7.5, cuda/8.0, or cuda/9.0. You can load one of the 3 available by adding a “module load…” line to your submit script. You can also issue a “module list” command to display what modules are currently loaded. The CUDA binaries (like nvcc) and libraries should now be available to your compute job:
module load cuda/8.0

module list
Currently Loaded Modulefiles:
  1) pymods/2.7.5    2) perlmods/5.16.3   3) cuda/8.0

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

 

  • If your code depends on The NVIDIA CUDA Deep Neural Network (cuDNN) GPU-accelerated library, you must load an available cuDNN module to set up your $LD_LIBRARY_PATH. There are several cudnn modules to choose from, depending on what cudnn version *and* what CUDA Toolkit version you require. Please use the command “module avail cudnn” to see what’s available.
module load cudnn/6.0-cuda8

module list
Currently Loaded Modulefiles:
  1) pymods/2.7.5    2) perlmods/5.16.3   3) cuda/8.0   4) cudnn/6.0-cuda8

 

  • If you would like to target a specific model of GPU, you can add a "feature" tag to your request. For example the following directive requests one node with one traditional computing core and one GTX-1080Ti GPU. There is also a "k80" tag for requesting one of the existing Telsa K80 GPUs. The following directive requests one node with one traditional computing core and one K80 GPU:
### If you prefer an NVIDIA Tesla GTX-1080ti, specify the "gtx1080ti" feature tag:
#PBS -l nodes=1:ppn=1:gpus=1:gtx1080ti

### If you prefer an NVIDIA Tesla K80, specify the "k80" feature tag:
#PBS -l nodes=1:ppn=1:gpus=1:k80

The Copperhead Cluster can be accessed via SSH to “hpc.uncc.edu.”  This will connect the user to one of the Interactive/Submit nodes located in the Copperhead Cluster pool.   From those nodes, a user can submit jobs requesting the following resources:

General Compute Nodes (63 x 16 cores ea + 2 x 32 cores ea = 1072 CPU cores)

GPU Compute Nodes (4 x 2 NVIDIA K80 GPUs and 16 cores ea = 8 GPUs and 64 CPU cores)

Large Memory Compute Nodes (16 CPU cores and 768GB RAM)

Jobs should always be submitted to the “copperhead” queue unless directed otherwise by URC Support.

Copperhead Defaults

To make more efficient use of the resources, user jobs are now submitted with a set of default resource requests which can be overridden on the qsub command line or in the job submit script via qsub directives.    If not specified by the user, the following defaults are set:

#PBS -l walltime=8:00:00        (Max Job Run time is 8 hours)

#PBS -l mem=2GB                      (Allow up to 2GB of Memory per job)

See the discussions below for more details

 

Nodes= vs Procs=

In the older URC clusters i.e. viper, cobra, python, etc. if a job needed a particular number of processors (procs) and did not care how they were distributed among the compute nodes, the following syntax was allowed:

#PBS -l nodes=16

This would reserve the first 16 procs available regardless of where they were located. On Copperhead, this would actually cause the job to attempt to reserve 16 nodes with 1 proc each which may not be desirable.  To make more efficient use of resources and for clarification of the request, this syntax is no longer valid.   Instead use the following:

If you really want X procs on Y nodes:

#PBS -l nodes=Y:ppn=X

If you just want X procs and don’t care which or how many nodes they are on, use:

#PBS -l procs=X

This will allow the scheduler to better utilize available resources and may help your job get scheduled quicker.

 

Walltime

This determines the actual amount of time a job will be allowed to run. If not specified by the user, the default value is now 8 hours.  This value (or less) automatically labels a job as being a “short” job and therefore will have more potential nodes on which it can run.   Jobs requiring longer than 8 hours are considered “long” jobs and are restricted as to potential nodes they can run on.  In most cases, the longer the requested walltime, the lower the priority a job will have in competition with other, shorter, jobs.

Example:  #PBS -l walltime=8:00:00      # 8 hours, 0 minutes, 0 seconds

 

Memory (mem)

mem – Is the amount of memory to be allocated to a job.  If not set by the user, it defaults to roughly 2GB per core requested.   “mem”  applies to the entire job and is therefore a separate resource request which can be specified either as part of the nodes specification (separated by a  comma)

Example:     #PBS -l nodes=2:ppn=2,mem=8GB

or as a separate directive

Example:     #PBS -l nodes=2:ppn=2

                      #PBS -l mem=8GB

These two examples are equivalent and request a total of  8GBs of memory spread across 4 cores on 2 nodes.

Jobs whose memory requirements will not allow it to fit on a regular compute node will automatically be scheduled on a large memory compute node.  Regular compute nodes have 128 GBs of memory and 16 cores.

Jobs that exceed the requested memory will be terminated by the scheduler.

 

GPUs

GPUs are requested like procs.   Currently there are 4 Copperhead nodes each of which contains 2 addressable GPUS. Note that gpus= is part of the node’s properties, therefore you use a colon to separate it (like :ppn=), as opposed to separate resource request (like ,mem=), which uses a comma to separate.

Example:     #PBS -l nodes=1:ppn=1:gpus=1                 #  (1 node with 1 cpu and 1 gpu)

 

Note that you cannot use the following on Copperhead:

#PBS -l gpus=N

While Torque will accept this by defaulting to nodes=1, this becomes confusing if you attempt to ask for gpus=3 on Copperhead since there is a maximum of 2 gpus per node.   If you need more than 2 gpus, for a job, the following are examples of valid requests:

#PBS -l nodes=3:ppn=1:gpus=1               # 3 Nodes, 1 gpu each (3 GPUs)

#PBS -l nodes=2:ppn=1:gpus=2               # 2 Nodes, 2 gpus each (4 GPUs)

Applications

If you would like to utilize the NVIDIA GPUs on the cluster for your compute job, below are some tips to help your job do so.

  • Make sure you ask the scheduler for a GPU in your job request (submit script). You append the GPU request on the #PBS directive in which you ask for CPUs, for example:
#PBS -l nodes=1:ppn=1:gpus=1,mem=16GB

 

  • Unless your code has built-in GPU support (for example, Matlab), you may want to load one of the available CUDA Toolkit modules; currently we offer 3: cuda/7.5, cuda/8.0, or cuda/9.0. You can load one of the 3 available by adding a “module load…” line to your submit script. You can also issue a “module list” command to display what modules are currently loaded. The CUDA binaries (like nvcc) and libraries should now be available to your compute job:
module load cuda/8.0

module list
Currently Loaded Modulefiles:
  1) pymods/2.7.5    2) perlmods/5.16.3   3) cuda/8.0

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

 

  • If your code depends on The NVIDIA CUDA Deep Neural Network (cuDNN) GPU-accelerated library, you must load an available cuDNN module to set up your $LD_LIBRARY_PATH. There are several cudnn modules to choose from, depending on what cudnn version *and* what CUDA Toolkit version you require. Please use the command “module avail cudnn” to see what’s available.
module load cudnn/6.0-cuda8

module list
Currently Loaded Modulefiles:
  1) pymods/2.7.5    2) perlmods/5.16.3   3) cuda/8.0   4) cudnn/6.0-cuda8

 

  • If you would like to target a specific model of GPU, you can add a "feature" tag to your request. For example the following directive requests one node with one traditional computing core and one GTX-1080Ti GPU. There is also a "k80" tag for requesting one of the existing Telsa K80 GPUs. The following directive requests one node with one traditional computing core and one K80 GPU:
### If you prefer an NVIDIA Tesla GTX-1080ti, specify the "gtx1080ti" feature tag:
#PBS -l nodes=1:ppn=1:gpus=1:gtx1080ti

### If you prefer an NVIDIA Tesla K80, specify the "k80" feature tag:
#PBS -l nodes=1:ppn=1:gpus=1:k80

What are Environment Modules?

The environment modules package is a tool that allows you to quickly and easily modify your shell environment to access different software packages. Research Computing offers a large (and growing) number of software packages to users, and each package may contain several tools, manual pages and libraries, or it may require special setup to work properly. Some software packages come in several versions or flavors, many of which conflict with each other. Modules allows you to tailor your shell to access exactly the packages you need by setting up the relevant environment variables for you, and automatically avoiding many possible conflicts between packages.

Command Summary

module avail List available modules
module load Load module named
module unload Unload module named
module whatis Give description of module
module list List modules that are loaded in your environment
module purge Unload all currently loaded modules from your environment
module display Give the rules for module

 

Example Usage

$ module avail

------------------------- /usr/share/Modules/modulefiles ----------------------------
dot module-git module-info modules null use.own

---------------------- /apps/usr/modules/compilers ----------------------------------
bazel/0.3.0          intel/14.0.3          pymods/2.7.12 scala/2.10.4(default)
bazel/0.4.5(default) intel/16.0.0(default) pymods/2.7.5  scala/2.11.7
gcc/4.9.3            perlmods/5.16.3       pypy/5.3.1    yasm/1.3.0
gcc/5.3.0(default)   pgi/14.4              python/2.7.12
ghc/7.10.3           pgi/15.9(default)     python/3.5.1

------------------------------- /apps/usr/modules/lib -------------------------------
clblas/1.10           hdf5/1.8.16-pgi          netcdf/4.4.0(default)
glew/1.13.0           hdf5/1.8.16-pgi-mpi      netcdf/4.4.0-intel
google-code/2015      htslib/1.4.1             netcdf/4.4.0-intel-mpi
hdf/4.2.11(default)   libgpuarray/0.9997       netcdf/4.4.0-mpi
hdf/4.2.11-intel      netcdf/4.3.3.1           netcdf/4.4.0-pgi
hdf/4.2.11-pgi        netcdf/4.3.3.1-intel     netcdf/4.4.0-pgi-mpi
hdf5/1.8.16(default)  netcdf/4.3.3.1-intel-mpi openblas/0.2.18
hdf5/1.8.16-intel     netcdf/4.3.3.1-mpi       openblas/0.2.18-nehalem
hdf5/1.8.16-intel-mpi netcdf/4.3.3.1-pgi       trilinos/11.14.1-mpi
hdf5/1.8.16-mpi       netcdf/4.3.3.1-pgi-mpi   trilinos/12.6.4-mpi(default)

------------------------------- /apps/usr/modules/mpi -------------------------------
openmpi/1.10.0(default) openmpi/1.8.1            openmpi/1.8.1-pgi    openmpi/2.1.1-intel
openmpi/1.10.0-ib       openmpi/1.8.1-ib         openmpi/1.8.1-pgi-ib openmpi/2.1.1-intel-ib
openmpi/1.10.0-intel    openmpi/1.8.1-intel      openmpi/2.1.1        platform-mpi/9.01
openmpi/1.10.0-intel-ib openmpi/1.8.1-intel14    openmpi/2.1.1-gcc53
openmpi/1.10.0-pgi      openmpi/1.8.1-intel14-ib openmpi/2.1.1-gcc53-ib
openmpi/1.10.0-pgi-ib   openmpi/1.8.1-intel-ib   openmpi/2.1.1-ib

------------------------------ /apps/usr/modules/apps -------------------------------
abaqus/2017            gromacs/5.1.2(default)      poy/5.1.2-ib(default)
abaqus/6.10-2          gromacs/5.1.2-avx2          qiime/1.9.1
abaqus/6.13-4(default) gromacs/5.1.2-cuda          quickflash/1.0.0
abyss/1.9.0            gromacs/5.1.2-cuda-avx2     quickflash/1.0.0-ib
abyss/1.9.0-ib         gromacs/5.1.2-mpi           R/3.1.1
allpathslg/52488       gromacs/5.1.2-mpi-avx2      R/3.2.3(default)
ansa/13.1.3            gromacs/5.1.2-mpi-cuda      R/3.3.1
art/03.19.15           gromacs/5.1.2-mpi-cuda-avx2 raxml/7.4.2
asciidoc/8.6.9         gromacs/5.1.2-mpi-ib        raxml/7.4.2-mpi
augustus/3.2.3         gromacs/5.1.2-mpi-ib-avx2   raxml/8.2.4(default)
bamtools/2.4.1         itk/4.9.0                   raxml/8.2.4-mpi
bcftools/1.3.1         jellyfish/2.2.6             repdenovo/0.0
bedtools2/2.26.0       lammps/15May15              rosetta/2015.02
bioconductor/3.2       lammps/16Feb16(default)     rosetta/2016.10(default)
 .
 .
 .

$ module avail matlab
--------------------- /apps/usr/modules/apps ----------------------
matlab/R2014a      matlab/R2015b(default)     matlab/R2016b

$ module display matlab/R2015b
-------------------------------------------------------------------
/apps/usr/modules/apps/matlab/R2015b:

module-whatis MATLAB is a high-level language and interactive 
environment for numerical computation, visualization, and programming.
conflict matlab
setenv MATLAB_HOME /apps/pkg/matlab-R2015b
setenv MATLAB_DIR /apps/pkg/matlab-R2015b
prepend-path MATLABPATH /apps/pkg/matlab-R2015b/toolbox_urc/xlwrite
prepend-path CLASSPATH /apps/pkg/matlab-R2015b/toolbox_urc/xlwrite/jxl.jar:/apps/pkg/matlab-R2015b/toolbox_urc/xlwrite/MXL.jar
prepend-path PATH /apps/pkg/matlab-R2015b/bin
prepend-path LD_LIBRARY_PATH /apps/pkg/matlab-R2015b/bin/glnxa64:/apps/pkg/matlab-R2015b/runtime/glnxa64
prepend-path LM_LICENSE_FILE 1700@adm-lic2.uncc.edu
-------------------------------------------------------------------

$ module load matlab/R2015b

$ module list
Currently Loaded Modulefiles:
  1) pymods/2.7.5      2) perlmods/5.16.3   3) matlab/R2015b

 

How the Modules are Organized and Grouped

The modules are organized into “categories”, which include: /apps/usr/modules/mpicompilersapps, and /apps/sys/Modules/3.2.6/modulefiles. Under each category, you will see “groups” of applications: openmpi, intel, pgi, to name a few. Within each group, there may be several versions to choose from. The group and version are separated with a “slash” (/).

Default Modules

You probably noticed some modules listed above are suffixed with a “(default)”. The “default” module is the module that will get loaded if you do not specify a version number. For example, we can load the “intel/16.0.0” module by omitting the version number:

$ module load intel

$ module list
Currently Loaded Modulefiles:
  1) pymods/2.7.5      2) perlmods/5.16.3   3) intel/16.0.0
Note: If you plan to load a version of a module that is not the default, then you must specify the version in the module load command.

Conflicts and Prerequisites

Some modules conflict with others, and some modules are prerequisites of others. Environment Modules handles both scenarios.

The following is an example of trying to load a module that is dependent upon another:

$ module display gromacs/4.6.7-cuda
-------------------------------------------------------------------
/apps/usr/modules/apps/gromacs/4.6.7-cuda:

module-whatis GROMACS is a versatile package to perform molecular dynamics,
i.e. simulate the Newtonian equations of motion for systems with hundreds
to millions of particles. It is primarily designed for biochemical molecules
like proteins, lipids and nucleic acids that have a lot of complicated bonded
interactions, but since GROMACS is extremely fast at calculating the nonbonded
interactions, many groups are also using it for research on non-biological
systems, e.g. polymers.
conflict gromacs
prereq cuda
setenv GROMACS /apps/pkg/gromacs-4.6.7/rhel7_u2-x86_64/gnu-cuda
setenv GMXBIN /apps/pkg/gromacs-4.6.7/rhel7_u2-x86_64/gnu-cuda/bin
setenv GMXLDLIB /apps/pkg/gromacs-4.6.7/rhel7_u2-x86_64/gnu-cuda/lib
setenv GMXDATA /apps/pkg/gromacs-4.6.7/rhel7_u2-x86_64/gnu-cuda/share
setenv GMXMAN /apps/pkg/gromacs-4.6.7/rhel7_u2-x86_64/gnu-cuda/share/man
setenv GMXLIB /apps/pkg/gromacs-4.6.7/rhel7_u2-x86_64/gnu-cuda/share/gromacs/top
prepend-path PATH /apps/pkg/gromacs-4.6.7/rhel7_u2-x86_64/gnu-cuda/bin
prepend-path MANPATH /apps/pkg/gromacs-4.6.7/rhel7_u2-x86_64/gnu-cuda/share/man
prepend-path LD_LIBRARY_PATH /apps/pkg/gromacs-4.6.7/rhel7_u2-x86_64/gnu-cuda/lib
-------------------------------------------------------------------
$ module load gromacs/4.6.7-cuda
gromacs/4.6.7-cuda(12):ERROR:151: Module 'gromacs/4.6.7-cuda' depends on one of the module(s) ''
gromacs/4.6.7-cuda(12):ERROR:102: Tcl command execution failed: prereq cuda

To resolve the above error, simply load the “prereq” module first, then load the original module. First you might want to see what “cuda” modules are available:

$ module avail cuda
---------------------- /apps/usr/modules/gpu ----------------------
cuda/7.5(default)    cuda/8.0

Select one that you would like to load to satisfy Gromac’s requirement. You can do this in a single command:

$ module load cuda/8.0 gromacs/4.6.7-cuda

More information

You can find more information about Environment Modules on SourceForge.net:
http://modules.sourceforge.net/

Introduction

Torque is an Open Source scheduler based on the old PBS scheduler code. The following is a set of directions to assist a user in learning to use Torque to submit jobs to the URC cluster(s).  It is tailored specifically to the URC environment and is by no means comprehensive. 

Details not found in here can be found online at:

http://docs.adaptivecomputing.com/torque/6-0-1/help.htm

Note:
Some of the sample scripts displayed in the text are not complete so that the reader can focus specifically on the item being discussed.  Full, working examples of scripts and commands are provided in the Examples section at the end of this document.

Submitting a Job

To submit a job to the Copperhead cluster, you must first SSH into the Research Computing submit host, hpc.uncc.edu. Scheduling a job in Torque requires creating a file that describes the job (in this case a shell script) and then that file is given as an argument to the Torque command “qsub” to execute the job.

First of all, here is a sample shell script (myjob.sh) describing a simple job to be submitted:

#! /bin/bash

# ==== Main ======
/bin/date

This script simply runs the ‘date’ command.  To submit it to the scheduler for execution, we use the Torque qsub command:

$ qsub -N "MyJob" -q "copperhead" -l procs=1 my_script.sh

This will cause the script (and hence the date command) to be scheduled on the cluster. In this example, the “-N” switch gives the job a name, the “-q” switch is used to route the job to the “copperhead” queue, and the “-l” switch is used to tell Torque (PBS) how many processors your job requests.

Many of the command line options to qsub can also be specified in the shell script itself using Torque (PBS) directives. Using the previous example, our script (my_script.sh) could look like the following:

#!/bin/sh

# ===== PBS OPTIONS =====
### Set the job name
#PBS -N "MyJob"

### Specify queue to run in
#PBS -q "copperhead"

### Specify number of CPUs for job
#PBS -l procs=1

# ==== Main ======
/bin/date

This reduces the number of command line options needed to pass to qsub. Running the command is now simply:

$ qsub my_script.sh

For the entire list of options, see the man page qsub i.e.

$ man qsub

Standard Output and Standard Error
In  Torque, any output that would normally print to stdout or stderr is collected into two files. By default these files are placed in the initial working directory where you submitted the job from and are named:

scriptname.{o}jobid for stdout
scriptname.{e}jobid for stderr

In our previous example (if we did not specify a job name with -n) that would translate to:

My_script.sh.oNNN
My_script.sh.eNNN

Where NNN is the job ID number returned by qsub.  If I named the job with -N (as above) and it was assigned job id 801, the files would be:

MyJob.o801
MyJob.e801

Logs are written to the job’s working directory ($PBS_O_WORKDIR) unless the user specifies otherwise.

Monitoring a Job

Monitoring a Torque job is done primarily using the Torque command “qstat.” For instance, to see a list of available queues:

$ qstat -q

To see the status of a specific queue:

$ qstat "queuename"

To see the full status of a specific job:

$ qstat -f  jobid

where jobid is the unique identifier for the job returned by the qsub command.

Deleting a Job

To delete a Torque job after it has been submitted,  use the qdel command:

$ qdel jobid

where jobid is the unique identifier for the job returned by the qsub command.

Monitoring Compute Nodes

To see the status of the nodes associated with a specific queue, use the torque command pbs_nodes(1) (qlso referred to as qnodes):

$ pbsnodes :queue_name

where  queue_name is the name of the queue  prefixed by a colon (:).  For example:

$ pbsnodes :copperhead

would display information about all of the nodes associated with the “copperhead” queue.  The output includes (for each node) the number of cores available (np= ).  If there are jobs running on the node, each one is listed in the (jobs= ) field.  This shows how many of the available cores are actually in use.

Parallel (MPI) Jobs

Parallel jobs are submitted to Torque in the manner described above except that you must first ask Torque to reserve the number of  processors (cores) you are requesting in your job.  This is accomplished using the -l switch to the qsub command:

For example:

$ qsub  -q copperhead -l procs=16 my_script.sh

would submit my script requesting 16 processors (cores)  from the “copperhead” queue.  The script (my_script.sh) would look something like the following:

#! /bin/bash
module load openmpi
mpirun -hostfile $PBS_NODEFILE  my_mpi_prgram

If you need to specify a specify number of processors (cores) per compute host, you can append a colon (:) to the number of specified nodes and then append the number of processors per host.  For example, to request 16 total processors (cores) with only 4 per compute host, the syntax would be:

$ qsub  -q copperhead -l nodes=4:ppn=4 my_script.sh

As described previously, options to qsub can be  specified directly in the script file.  For the example above, my_script.sh would look similar to the following:

#! /bin/bash

### Set the job name
#PBS -N MyJob

### Run in the queue named "copperhead"
#PBS -q copperhead
### Specify the number of cpus for your job.
#PBS -l nodes=4:ppn=4

### Load OpenMPI environment module.
module load openmpi

### execute mpirun
mpirun my_mpi_prgram

Examples of Torque Submit Scripts

NOTE: Additional sample scripts can be found online in /apps/torque/examples.

[1] Simple Job (1 CPU)

#! /bin/bash

#PBS -N MyJob
#PBS -q copperhead
#PBS -l procs=1

# Run program
/bin/date

[2] Parallel Job – 16 Processors (Using OpenMPI)

#! /bin/bash

#PBS -N MyJob
#PBS -q copperhead
#PBS -l procs=16

### load env for Infiniband OpenMPI
module load openmpi/1.10.0-ib

# Run the program "simplempi" with an argument of "30"
mpirun /users/joe/simplempi 30