Torque provides users the ability to run scripts before and/or after each job executes. With such a script, users can get more information about how their job ran, which node (or nodes) it ran on, how much CPU and RAM it used, etc.
**NOTE** If you are redirecting the logs in your submit script and capturing STDOUT and STDERR, implementing the prologue and epilogue into your submit script is incompatible. Please check if your submit script has the following PBS directives:
#PBS -o /dev/null #PBS -e /dev/null
and the following (or similar) line in the body:
exec 1>$PBS_O_WORKDIR/$PBS_JOBNAME-$SHORT_JOBID.out 2>$PBS_O_WORKDIR/$PBS_JOBNAME-$SHORT_JOBID.err
If you have the above lines in your submit script, please remove them before continuing with the following steps.
Follow these steps to set up a prologue and epilogue in your PBS submit script:
rsync -avz /apps/torque/logs/ $HOME/torque
#PBS -l prologue=/users/username/torque/prologue.sh #PBS -l epilogue=/users/username/torque/epilogue.sh
The top of your job’s output file should have some “header” text with information similar to this example:
======================================================================== Start Time: Thu Jul 27 10:34:40 EDT 2017 User/Group: jehalter / jehalter Job ID: 318469.cph-m1.uncc.edu Job Name: intake2 Job Details: epilogue=/users/jehalter/torque/epilogue.sh,neednodes=2: ppn=6,nodes=2:ppn=6,pmem=2500mb,prologue=/users/jehalter /torque/prologue.sh,walltime=00:05:00 Queue: copperhead Nodes: cph-c25.uncc.edu cph-c26.uncc.edu ========================================================================
Likewise, the bottom of your job’s output file should have a “footer” for information similar to this example:
======================================================================== End Time: Thu Jul 27 10:35:23 EDT 2017 Resources: cput=00:03:06,energy_used=0,mem=1110628kb,vmem=9890428kb ,walltime=00:00:43 Exit Value: 0 ========================================================================
This information can be valuable to plan future job submissions, for example, if you request 64GB RAM for your job, and you look at the epilogue information that shows it used less than 32GB, then you can adjust your request in future job submissions, which may allow your jobs to be scheduled quicker.
For more information, please visit the “Prologue and Epilogue Scripts” section on Adaptive Computing’s documentation website.
Access to the cluster requires a URC account. Accounts are available on request for UNC Charlotte faculty and for graduate students working on faculty sponsored research projects. Contact any member of the URC Staff if you are interested in requesting a URC account.
In general, the user account name is the same as the official campus login ID (also used for Novell, 49er Express, and Mosaic); however, the account password is unique to the URC cluster and is not synced with any other campus authentication system.
Most of the cluster resources are on a private, internal network. So access to the cluster is via one (or a few) node(s) that also have connections to the campus’ public network. Each of our clusters have a submit host, and some of them have a separate interactive host as well. Please refer to the host list on the “Job Scheduling with Torque” page to determine which host you will need to log into.
The supported methods for connecting to this node are:
The interactive hosts that you can SSH into are:
Most common Unix or Linux operating systems (including Mac OS X) come equipped with client versions of these programs for use from the command line. For more information refer to the UNIX man pages for ssh, scp, sftp.
There are also various commercial and shareware versions of these commands available for MS Windows. One popular free client is PuTTY SSH. For securely transferring files to/from Windows, WinSCP is a popular client.
When logged in to one of the interactive nodes, you can download (PULL) data in using one of several different protocols: HTTP(80), HTTPS(443), RSYNC(873), and SCP/SFTP(22).
If you are on your workstation or laptop and you would like to transfer data to/from the cluster , you can do so using one of the various command line or GUI SCP/SFTP clients. We have found that some 3rd party GUI clients do not work well (or at all) with DUO two-factor authentication. But there are some clients that work fine with DUO, some of which are listed below.
WinSCP is a popular SCP/SFTP GUI for Windows. It works well with DUO two-factor auth, using the default settings.
FileZilla, another popular SFTP GUI, is available for Windows, Mac OS X, and Linux. It also works well with DUO two-factor auth, however, you must choose some non-default options (outlined below) in order to have the best experience with your file transfers.
Cyberduck is a popular SFTP GUI for Windows and Mac OS X. For a smooth experience using DUO two-factor auth, make sure to set File Transfer settings to "Use browser connection" to avoid having to authenticate each time you want to transfer a file. If you are having trouble trying to get your 3rd party SCP/SFTP GUI client to authenticate to our clusters, please contact us.
NEVER modify the permissions on your /home or /scratch directory. If you need assistance, please contact us.
If you have received an email from the batch scheduler, stating that you’ve had a “moab job resource violation,” then your job was cancelled because it used more resources than what was requested in your submit script. The most common violation is for using more RAM that your job requested, which will be revealed in the body of the email, like so:
job 1193978 exceeded MEM usage soft limit
If you did NOT specify an amount of RAM in your job’s submit script and you received the “MEM usage soft limit” email, then you elected to accept the default, and the default amount of RAM was not enough for your job to complete. Currently, for Copperhead compute nodes, the default amount of RAM given to a job is 2GB/core.
If you would like to know how much RAM your compute job is using on the cluster, consider adding the prologue/epilogue into your submit script, which will give you additional job information in your job output log.