There is remote access to some of the Research Computing clusters. Below we outline the different methods for remote access:
You can use ssh to log in to the interactive/submit host, hpc.uncc.edu.
Once logged into Copperhead, you can “ssh sidewinder” to access the Sidewinder cluster.
Yes you can. The URC cluster currently supports parallel processing using the MPI standard. These jobs are also submitted via the Condor job scheduler.
Parallel processing is supported via two separate Condor universes. The older MPI universe supports the mpich implementation of the Message Passing Interface (MPI) standard while the newer Parallel universe provides a flexible mechanism that can be adapted to almost any parallel execution methodology.
If you wish to use the MPI universe to run your own mpi code, you need to include the mpich libraries (version 1.2.4 only) when compiling/linking your program. One way to do this is to use the mpich compiler commands mpicc, mpiCC, mpif77, or mpif90 provided in /apps/mpich/bin. These commands are simply scripts that run the gnu C or C++ compiler or the Portland Group Fortran compiler with the recommended libraries and command line options. Alternatively, you can simply reference the relevant include files and libraries (found in /apps/mpich/include and /apps/mpich/lib respectively) in your project makefile or on the command line.
Many open source or commercial applications also include support for parallel processing. In order for the applications to work properly in the MPI universe, they must support mpich version 1.2.4. Applications that do not support mpich 1.2.4 should be run via the parallel universe.
Once you have a properly compiled mpich executable, the process to submit a job to Condor is as described above
condor_submit -n mpi <submit_file>
where the submit file would be as shown above with the Universe set to MPI and an additional line to specify the desired number of processors, e.g.:
Machine_count = 4
Choosing an optimal value for the number of processors often requires some experimentation since each application is likely to scale differently depending on a variety of factors like the ratio of computation to I/O. Very few applications scale well beyond 16 or 32 processors.
The additional qualifier on the condor_submit command (-n mpi) is necessary to direct the job to the correct condor scheduling host. Not all universes are managed by the same condor server.
Parallel execution in the “parallel” universe is extremely flexible in terms of the types of parallel operation that are possible. The user specifies a wrapper script in addition to an executable. The wrapper is executed on each node that is assigned to the user’s job and is passed the executable as an argument. This allows the user to construct a custom parallel environment within their assigned nodes, and then start the real executable in a fashion that is appropriate to their custom environment.
For users of the old mpi queue, the changes required to switch to the parallel queue are minimal. In the condor_submit file change these lines:
universe = mpi
executable = my.exe
arguments = -arg1 -arg2
with these lines:
universe = parallel
executable = /apps/condor/scripts/mpi-1.2.4
arguments = my.exe -arg1 -arg2
The universe changes from mpi to parallel, the user’s executable becomes the first argument, and the executable is a standard wrapper script that is installed as part of the condor software package.
The resulting submit file is then submitted using:
condor_submit -n parallel <submit file name>
In this example the user’s job will run using version 1.2.4 of mpich just as it does in the mpi queue. For users who wish to use a later version of mpich, a separate wrapper script, mpi-1.2.5, is also available under /apps/condor/scripts. Additional, wrapper scripts will be added whenever new versions of mpich are installed.
Additional wrapper scripts have been developed to support ABAQUS and Matlab DCE, and will be developed as need to support other applications (e.g. LAM MPI).
If you would like to utilize the NVIDIA GPUs on the cluster for your compute job, below are some tips to help your job do so.
#PBS -l nodes=1:ppn=1:gpus=1,mem=16GB
module load cuda/8.0 module list Currently Loaded Modulefiles: 1) pymods/2.7.5 2) perlmods/5.16.3 3) cuda/8.0 nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2016 NVIDIA Corporation Built on Tue_Jan_10_13:22:03_CST_2017 Cuda compilation tools, release 8.0, V8.0.61
module load cudnn/6.0-cuda8 module list Currently Loaded Modulefiles: 1) pymods/2.7.5 2) perlmods/5.16.3 3) cuda/8.0 4) cudnn/6.0-cuda8
### If you prefer an NVIDIA Tesla GTX-1080ti, specify the "gtx1080ti" feature tag: #PBS -l nodes=1:ppn=1:gpus=1:gtx1080ti ### If you prefer an NVIDIA Tesla K80, specify the "k80" feature tag: #PBS -l nodes=1:ppn=1:gpus=1:k80
The Copperhead Cluster can be accessed via SSH to “hpc.uncc.edu.” This will connect the user to one of the Interactive/Submit nodes located in the Copperhead Cluster pool. From those nodes, a user can submit jobs requesting the following resources:
General Compute Nodes (63 x 16 cores ea + 2 x 32 cores ea = 1072 CPU cores)
GPU Compute Nodes (4 x 2 NVIDIA K80 GPUs and 16 cores ea = 8 GPUs and 64 CPU cores)
Large Memory Compute Nodes (16 CPU cores and 768GB RAM)
Jobs should always be submitted to the “copperhead” queue unless directed otherwise by URC Support.
To make more efficient use of the resources, user jobs are now submitted with a set of default resource requests which can be overridden on the qsub command line or in the job submit script via qsub directives. If not specified by the user, the following defaults are set:
#PBS -l walltime=8:00:00 (Max Job Run time is 8 hours)
#PBS -l mem=2GB (Allow up to 2GB of Memory per job)
See the discussions below for more details
Nodes= vs Procs=
In the older URC clusters i.e. viper, cobra, python, etc. if a job needed a particular number of processors (procs) and did not care how they were distributed among the compute nodes, the following syntax was allowed:
#PBS -l nodes=16
This would reserve the first 16 procs available regardless of where they were located. On Copperhead, this would actually cause the job to attempt to reserve 16 nodes with 1 proc each which may not be desirable. To make more efficient use of resources and for clarification of the request, this syntax is no longer valid. Instead use the following:
If you really want X procs on Y nodes:
#PBS -l nodes=Y:ppn=X
If you just want X procs and don’t care which or how many nodes they are on, use:
#PBS -l procs=X
This will allow the scheduler to better utilize available resources and may help your job get scheduled quicker.
This determines the actual amount of time a job will be allowed to run. If not specified by the user, the default value is now 8 hours. This value (or less) automatically labels a job as being a “short” job and therefore will have more potential nodes on which it can run. Jobs requiring longer than 8 hours are considered “long” jobs and are restricted as to potential nodes they can run on. In most cases, the longer the requested walltime, the lower the priority a job will have in competition with other, shorter, jobs.
Example: #PBS -l walltime=8:00:00 # 8 hours, 0 minutes, 0 seconds
mem – Is the amount of memory to be allocated to a job. If not set by the user, it defaults to roughly 2GB per core requested. “mem” applies to the entire job and is therefore a separate resource request which can be specified either as part of the nodes specification (separated by a comma)
Example: #PBS -l nodes=2:ppn=2,mem=8GB
or as a separate directive
Example: #PBS -l nodes=2:ppn=2
#PBS -l mem=8GB
These two examples are equivalent and request a total of 8GBs of memory spread across 4 cores on 2 nodes.
Jobs whose memory requirements will not allow it to fit on a regular compute node will automatically be scheduled on a large memory compute node. Regular compute nodes have 128 GBs of memory and 16 cores.
Jobs that exceed the requested memory will be terminated by the scheduler.
GPUs are requested like procs. Currently there are 4 Copperhead nodes each of which contains 2 addressable GPUS. Note that gpus= is part of the node’s properties, therefore you use a colon to separate it (like :ppn=), as opposed to separate resource request (like ,mem=), which uses a comma to separate.
Example: #PBS -l nodes=1:ppn=1:gpus=1 # (1 node with 1 cpu and 1 gpu)
Note that you cannot use the following on Copperhead:
#PBS -l gpus=N
While Torque will accept this by defaulting to nodes=1, this becomes confusing if you attempt to ask for gpus=3 on Copperhead since there is a maximum of 2 gpus per node. If you need more than 2 gpus, for a job, the following are examples of valid requests:
#PBS -l nodes=3:ppn=1:gpus=1 # 3 Nodes, 1 gpu each (3 GPUs)
#PBS -l nodes=2:ppn=1:gpus=2 # 2 Nodes, 2 gpus each (4 GPUs)