This fall semester of 2020, we are migrating from PBS (Torque/Moab) to a new scheduler, Slurm Workload Manager, for all of our Research Computing Clusters. Here are some additional details regarding the rollout of Slurm in URC:
- Week of September 14: A new Research Cluster named Starlight will be available for general use. This cluster and its primary queue (Orion) will contain more than 1,300 new compute cores purchased this summer. Starlight will be available exclusively using the Slurm scheduler.
- Week of September 21: Move 25% of the existing Copperhead cluster cores to Starlight.
- Move another 50% of the existing Copperhead cores to Starlight.
- Migrate the remaining 25% of the existing Copperhead cores to Starlight and remove the old scheduler.
- The educational cluster will begin migrating to Slurm.
This guide provides information on how you can migrate your scripts and jobs from PBS to Slurm. There are two main aspects involved in the migration, learning the new commands for job submission and job script conversion. The concepts are the same in both schedulers, but the syntax of the commands, directives, and environment variables differ.
Equivalent Slurm commands exist for those commonly used in PBS, with the command names and options detailed in the following table.
|Submit a Job||qsub [job-submit-script]||sbatch [job-submit-script]|
|Delete a Job||qdel [job-id]||scancel [job-id]|
|Queue Info||qstat -q [queue]||scontrol show partition [partition]|
|Node List||pbsnodes -a [:queue]||scontrol show nodes|
|Node details||pbsnodes [node]||scontrol show node [node]|
|Job status (by job)||qstat [job-id]||squeue -j [job-id]|
|Job status (by user)||qstat -u [user]||squeue -u [user]|
|Job status (detailed)||qstat -f [job-id]||scontrol show job -d [job-id]|
|Show expected start time||showstart [job-id]||squeue -j [job-id] --start|
For a comprehensive list of Slurm commands, please download this Command Reference PDF on SchedMD's Website.
Existing PBS batch scripts can be readily migrated to use under the Slurm resource manager, with some minor changes to the directives and referenced environment variables. The more popular Slurm equivalent directives and environment variables are outlined below.
|Job name||-N [name]||--job-name=[name]|
|Queue / Partition||-q [queue]||--partition=[queue]|
|Wall time limit||-l walltime=[hh:mm:ss]||--time=[hh:mm:ss]|
|Node count||-l nodes=[count]||--nodes=[count]|
|CPU count per node||-l ppn=[count]||--ntasks-per-node=[count]|
|Memory size||-l mem=[limit] (*per job)||--mem=[limit] (*per node)|
|Memory per CPU||-l pmem=[limit]||--mem-per-cpu=[limit]|
|Standard output file||-o [filename]||--output=[filename]|
|Standard error file||-e [filename]||--error=[filename]|
|Combine stdout/stderr||-j oe (to stdout)||(default)|
|Copy environment||-V||--export=ALL (default)|
|Copy env variable||-v [var]||--export=var|
|Job dependency||-W depend=[state:jobid]||--dependency=[state:jobid]|
|Event notification||-m abe||--mail-type=[events]|
|Email address||-M [address]||--mail-user=[address]|
For a full list of directives, please consult SchedMD's sbatch Webpage.
Environment Variable Comparison
|Node List||cat $PBS_NODEFILE||$Slurm_JOB_NODELIST|
|Job Array Index||$PBS_ARRAYID||$Slurm_ARRAY_TASK_ID|
|Number of Nodes||$PBS_NUM_NODES||$Slurm_NNODES|
|Number of Procs||$PBS_NP||$Slurm_NTASKS|
|Procs per Node||$PBS_NUM_PPN||$Slurm_CPUS_ON_NODE|
For a full list of environmnet variables, please consult the Environment Variables Section on SchedMD's sbatch Webpage.
Tips on Converting Submit Scripts
We have created a utility called “p2s” (for PBS-to-Slurm), which is available on our Interactive/Submit hosts in your standard $PATH. Simply pass it the name of the script you would like to convert (give full-path if you are not in same directory of the script), and it will convert it from a PBS to a Slurm submit script. For example, to convert a PBS submit script named "submit-script.pbs" issue the following command on the Interactive/Submit host:
This will output the changes directly to STDOUT, so that you can view them right in your SSH session. If everything looks good and you would like to save the converted script to a new file, simply redirect the output to a file name of your choice (we recommend using a NEW file name, not the existing file name):
p2s submit-script.pbs > submit-script.slurm
There are many other conversion scripts available online to download and use, and you are welcome to download and try them out in our environment. or you can manually convert your scripts given the directives and environmnet variables listed above. Once converted, you may have to make some small tweaks or edits in order to get it fully ready for submitting to Slurm. Two codes come to mind that require some manual editing even after running my p2s conversion utility: StarCCM+, and ABAQUS.
We also have a collection of example Slurm submit scripts that you can copy and use as a template: /apps/slurm/examples
For more information about the Slurm Workload Manager, please check out the Slurm Documentation on SchedMD's Website.