Slurm

Description

Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters.

As a cluster workload manager, Slurm has three key functions:

It allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work.
It provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes.
It arbitrates contention for resources by managing a queue of pending work.

Usage

Slurm base commands

Slurm uses a series of base commands to execute programs and monitor the submission queues for workload tracking. Below is a list of basic commands used.

Command	Action
sinfo	Displays status and info of open nodes Example: `sinfo`
sbatch	Submits the file to the submission pool Example: `sbatch -options file`
sacct	Displays accounting data for all jobs Example: `sacct -options`
squeue	Displays information about jobs in the queue Example: `squeue -options`
scancel	Cancels Slurm job submitted Example: `scancel -options jobID`
savail	View the available resources on all of the nodes (custom to UWEC) Example: `savail`
myjobs	View your current jobs in the queue (custom to UWEC) Example: `myjobs`
myjoblogin	Log into a compute node under your job's existing allocation (custom to UWEC) Example: `myjoblogin jobID` or `myjoblogin jobID nodeName`
myaccounts	View all your accounts and your default account (custom to UWEC) Example: `myaccounts`
myqos	View Quality of Service groups and QoS limits associated with your accounts (custom to UWEC) Example:`myqos`
sinteract	Interactively run commands on a compute node. Defaults to one CPU core for eight hours. (custom to UWEC) Example: `sinteract --ntasks=4 --mem=10G`
sdevelop	Interactively run commands on our development node. Defaults to 16 CPU cores for eight hours. (custom to UWEC) Example: `sdevelop --ntasks=16 --mem=10G`

Script syntax

Slurm submissions require submission scripts. The general format for a Slurm submission script is:

#!/bin/bash
#
#SBATCH <OPTION>
#SBATCH <OPTION>
.
.
.
#SBATCH <OPTION>

<COMMAND TO BE EXECUTED TO RUN THE DESIRED PROGRAM>

Script SBATCH Flags

The #SBATCH directives are used to specify different options unique to the submitted job's needs.

Command	Action	Syntax Example
#SBATCH --partition	Specifies partition to use	#SBATCH --partion=yourPartition
#SBATCH --time	Sets maximum runtime limit for job	#SBATCH --time=dd-hh:mm:ss
#SBATCH --nodes	Sets the number of requested nodes.	#SBATCH --nodes=numberOfNodes
#SBATCH --ntasks-per-node	Specifies number of processors to use per node	#SBATCH --ntasks-per-node=coresPerNode
#SBATCH --mem	Sets the memory limit (in MB). DO NOT USE WITH mem-per-cpu	#SBATCH --mem=memoryLimit
#SBATCH --gpus=#	Specifies the requested amount of GPU's (BOSE-only and required for GPU use)	#SBATCH --gpus=numberOfGPU’s
#SBATCH --job-name	Sets the name of the job during runtime.	#SBATCH --job-name=”YourJobName”
#SBATCH --output	Sets the name of the output file	#SBATCH --output=outputFileName
#SBATCH --error	Sets the name of the error file.	#SBATCH --error=errorFileName
#SBATCH --exclude	Exclude nodes by node name. These are comma delimited	#SBATCH --exclude=nodeA,nodeB,nodeC
#SBATCH --nodelist	Use specific nodes. These are comma delimited	#SBATCH --nodelist=nodeA,nodeB,nodeC
#SBATCH --mail-user	Sets the users email notifications. Defaults to UWEC email address.	#SBATCH --mail-user=user@email.mail
#SBATCH --mail-type	Sets when the user receives an email (Options: NONE, ALL, BEGIN, END, FAIL, QUEUE)	#SBATCH --mail-type=ALL

Temporary Overrides

Besides including these in your Slurm script, you can also set these on demand when you submit your job.

When done this way, they only take effect for just that single job and override what's specified in your script.

sbatch --ntasks=32 --mem=20G my-script.sh

The above command submits a job that temporarily requests 32 cores and 20G of memory rather than what's listed inside my-script.sh

Example Script

Below is an example script for Slurm named hello.sh that results in a ''Hello from (your computer host name)''.

submit.sh

#!/bin/bash

#SBATCH --partition=week             #Partition to submit to
#SBATCH --time=0-00:00:30             #Time limit for this job
#SBATCH --nodes=1                     #Nodes to be used for this job during runtime. Use MPI jobs with multiple nodes.
#SBATCH --ntasks-per-node=1           #Number of CPUs. Cannot be greater than number of CPUs on the node.
#SBATCH --mem=512                     #Total memory for this job
#SBATCH --job-name="Slurm Sample"     #Name of this job in work queue
#SBATCH --output=ssample.out          #Output file name
#SBATCH --error=ssample.err          #Error file name
#SBATCH --mail-type=END               #Email notification type (BEGIN, END, FAIL, ALL). To have multiple use a comma separated list. i.e END,FAIL.

# Job Commands Below
echo "Hello from $(hostname)"

Example process for script submission

Create the script.sh file using a text editor, ensuring the file follows the format guidelines noted above.

Submit script.sh with the following command. It will be assigned a job number which will be printed out upon submission:

sbatch script.sh

If output files were specified and email notifications were set, your job will complete and generate those files. To check the progress of a submitted job, you can enter the following command:

sacct yourJobID

Instructional Videos

If you are still uncertain on how to use Slurm on our cluster to submit jobs, Slurm's website has several instructional videos that show you the basics.

Please see the link here: Training Videos

Common Errors / Statuses

Below is a list of common errors / statuses that we've seen come up that causes a job to be stuck pending in the queue or end up failing. This are typically found when running squeue or myjobs.

(QOSMaxJobsPerUserLimit)

QOSMaxJobsPerUserLimit: Quality of Service: You hit the max number of jobs you can run at any one time and the job will be able to start once your other jobs finish. Some groups or classes may also have their own custom restrictions applied. Run myqos to see a list of settings applied to your user/group.

(QOSMaxGRESPerUser)

QOSMaxGRESPerUser: Quality of Service: You hit the max number of GPUs you can use at any one time and the job will be able to start once your other jobs finish. Some groups or classes may also have their own custom restrictions applied. Run myqos to see a list of settings applied to your user/group.

(MaxNodePerAccount)

MaxNodePerAccount: Quality of Service: Your group has hit the max number of nodes you can use at one time. Certain limited partitions such as highmemory or GPU may have their own limitations. Run myqos to see a list of settings applied to your user/group.

(Resources)

Resources: Your job currently cannot be accommodated on any node due to available resources. This could be due to your selected partition / nodes and will have to wait until another job is completed. You can see the full list of available resources on all the nodes by using the savail command.

(Priority)

Priority: If there are multiple jobs currently pending in the queue (squeue), your job may have a lower priority or was submitted after others. Your job may have to wait until other jobs in the queue are completed before yours is next in line.

CONFIGURING (CF)

CONFIGURING (CF): When your job is marked as configuring, this means that the node you are going to use was in its power saving mode and is in the process of booting up. Usually after a few minutes your job will automatically start running.