SLURM User Guide

Last modified: February 26, 2024

Ver. 2022/1/9 - by Yuejia Zhang, Xiang Li

Introduction

The Simple Linux Utility for Resource Management (SLURM) is used for submission, scheduling, execution, and monitoring of jobs on our cluster. SLURM is a free open-source resource manager and scheduler. It is a modern, extensible batch system that is widely deployed around the world on clusters of various sizes.

This guide explains how users should use SLURM in order to run jobs on our cluster (for now).

Basic SLURM commands

SLURM offers a variety of user commands for all the necessary actions concerning the jobs. With these commands the users have a rich interface to allocate resources, query job status, control jobs, manage accounting information and to simplify their work with some utility commands.

sbatch submit a batch script that will be executed on the allocated computing node. The working directory coincides with the working directory of the sbatch directory. Within the script one or multiple srun commands can be used to create job steps and execute parallel applications.
sinfo show information about all partitions and nodes managed by SLURM as well as about general system state.
squeue query the list of pending and running jobs. By default it reports the list of pending jobs sorted by priority and the list of running jobs sorted separately according to the job priority. The most relevant job states are running (R), pending (PD), completing (CG), completed (CD) and cancelled (CA). The TIME field shows the actual job execution time. The NODELIST (REASON) field indicates on which nodes the job is running or the reason why the job is pending. Typical reasons for pending jobs are waiting for resources to become available (Resources) and queuing behind a job with higher priority (Priority).
scancel cancel a pending or running job or job step. It can also be used to send an arbitrary signal to all processes associated with a running job or job step.
(BANNED) salloc request interactive jobs/allocations. When the job is started a shell (or other program specified on the command line) is started on the submission host (login node). From this shell you should use srun to interactively start a parallel applications. The allocation is released when the user exits the shell.
srun initiate parallel job steps within a job or start an interactive job.

A very simple rookie-level guide to submit a job

STEP 1: Write bash script

Use your favorite editor to generate the below script test.sbatch. The first line must be

#!/bin/bash

From the second line, list all the commands you want to run on the server. For example, if you want to run a python 3 program called helloworld.py, then you only need to make your test.sbatch look like this:

#!/bin/bash
python3 helloworld.py

Or if you want to run a binary file myTest/test generated from myTest/test.c, then your test.sbatch should look like this:

#!/bin/bash
myTest/test

If myTest/test is a CUDA program, remember to load the environment, and apply for GPU resources (second line).

#!/bin/bash
#SBATCH --gpus=1
module load CUDA
myTest/test

If myTest/test is a MPI program, remember to load the environment.

#!/bin/bash
source /opt/intel/oneapi/setvars.sh
mpirun -n 10 ./testmpi

If you run a MATLAB program, remember to load the environment and run testMatlab.m. Remember to cd to the directory where matlab file is before submitting your job.

#!/bin/bash
module load MATLAB
matlab -batch "testMatlab"

STEP 2: Submit your job

sbatch test.sbatch

It will return Submitted batch job ###, where ### is your job id.

STEP 3: View running progress and results

When your program is finished, an output file will appear in the directory slurm-###.out, where ### is your job id. You can open it using your favorite editor, or run

cat slurm-###.out

to see it in the terminal.

While waiting for your program to execute, you can know the status of the program through squeue. For example, the following command list all the job submitted by username:

squeue -u username

where username is your username (e.g. yjzhang). You will see your job with your jobid, job name, status and time spent, like the following:

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               001    bigMem  python3  yjzhang PD       0:00     10 (PartitionNodeLimit)
               002    bigMem  python3  yjzhang  R       0:56      1 bigMem0

Typical job states from submission until completion are: PENDING (PD) and RUNNING (R). If you don’t see your job then most likely it is finished.

If you don’t want to run a job anymore, cancel it using the following command:

scancel ###

where ### is the jobid (e.g. 001). After you cancel it, you will no longer see it in the queue.

The rest of the document may be a little difficult for beginners, you can skip it in the first reading.

Advanced usage of `sbatch`

The usual way to allocate resources and execute a job is to write a batch script and submit them to SLURM with the sbatch command. The batch script is a shell script consisting of two parts: resources requests and job steps. Resources requests are specifications for number of nodes needed to execute the job, time duration of the job etc. Job steps are user’s tasks that must be executed. The resources requests and other SLURM submission options are prefixed by ‘#SBATCH ‘ directives and must precede any executable commands in the batch script.

Below is an example of sbatch script:

#!/bin/bash
#SBATCH --job-name=python_job_test    # Job name
#SBATCH --nodes=1                     # Number of nodes to use
#SBATCH --ntasks=1                    # Number of tasks (MPI processes)
#SBATCH --cpus-per-task=1             # Number of threads (logical cores) per task (OPENMP)
#SBATCH --time=00:05:00               # Time limit hrs:min:sec
#SBATCH --output=python_%j.log        # Standard output and error log
pwd; hostname; date
echo "Running python on the server"
python3 helloworld.py

where helloworld.py is just reading the local file helloworld.txt and print out the content.

with open("helloworld.txt", "r") as f:
    s = f.readline()
    print(s)

Run sbatch test.sbatch and see the log file.

[yjzhang@loginNode ~]$ sbatch test.sbatch
Submitted batch job 235
[yjzhang@loginNode ~]$ cat python_235.log
/home/yjzhang
bigMem0
2021年 12月 15日 星期三 13:39:07 CST
Running python on the server
Hello File!

The following table describes the most common or required allocation and submission options that can be defined in a batch script (short options are listed in parentheses):

sbatch option	default value	description
–nodes=<number> (-N <number>)	1	Number of nodes for the allocation
–ntasks=<number> (-n <number>)	1	Number of tasks (MPI processes). Can be omitted if –nodes and –ntasks-per-node are given
–ntasks-per-node=<num>	1	Number of tasks per node. If keyword omitted the default value is used, but there are still 48 CPUs available per node for current allocation (if not shared)
–cpus-per-task=<number> (-c <number>)	1	Number of threads (logical cores) per task. Used for OpenMP or hybrid jobs
–output=<path>\/<file pattern> (-o <path>\/<file pattern>)	slurm-%j.out (%j = JobID)	Standard output file
–error=<path>\/<file pattern> (-e <path>\/<file pattern>)	slurm-%j.out (%j = JobID)	Standard error file
–time=<walltime> (-t <walltime>)	3 days	Requested walltime limit for the job; possible time formats are: [hours:]minutes[:seconds] e.g. 20, 01:20, 01:20:30 days-hours[:minutes][:seconds] e.g. 2-0, 1-5:20, 1-5:20:30
–partition=<name> (-p <name>)	bigMem	Partition to run the job
–job-name=<jobname> (-J <jobname>)	job script’s name	Job name

To see more usage of sbatch, refer to the below websites:

Official introduction of sbatch
More examples: 1, 2 and 3

SLURM Command Examples

Below some examples of SLURM query commands are provided.

List all jobs submitted to SLURM:

squeue

List all jobs submitted by you:

squeue -u $USER

Check available partitions and nodes:

sinfo

The sinfo command reports the states of the partitions and the nodes. The partitions may be in state UP, DOWN or INACTIVE. The UP state means that a partition will accept new submissions and the jobs will be scheduled. The DOWN state allows submissions to a partition but the jobs will not be scheduled. The INACTIVE state means that submissions are not allowed. The nodes also can be in various states, such as alloc (allocated), comp (completing), down, idle, maint, resv (reserved) etc. Description of all node states can be get from the sinfo man page.

List partition state summary

sinfo -s

The column NODES(A/I/O/T) shows number of nodes in the states “allocated/idle/other/total” for each SLURM partition.

Cancel job with SLURM JobId 4711:

scancel 4711

Cancel all your jobs:

scancel -u $USER

Reference

Reference: https://www.dkrz.de/up/systems/mistral/running-jobs/slurm-introduction