Slurm is used for cluster management and job scheduling. Slurm has three key features:
Submit a bash script to slurm. Slurm will schedule this bash script given the arguments presented in the script.
$ sbatch script.shScript example:
#! /bin/bash
#################### Batch Headers ####################
#SBATCH -p drcluster        # Get it? DRC cluster ;)
#SBATCH -J hello_world      # Custom name
#SBATCH -o results-%j.out   # stdout/stderr redirected to file
#SBATCH -N 3                # Number of nodes
#SBATCH -n 1                # Number of cores (tasks) 
#######################################################
python hello_world.py
sacct or squeue)%j is job number)Environment variables:
scancel is used to cancel a task
$ scancel <jobID>scontrol allows you to view or alter a job's details
View job details:
$ scontrol show job 14Suspend a job:
$ sudo scontrol suspend 14Continue a job:
$ scontrol resume 14Give up on a job:
$ scontrol release 14sinfo provides information about the cluster
$ sinfo
PARTITION  AVAIL  TIMELIMIT  NODES  STATE NODELIST
drcluster*    up   infinite      3   idle node[02-04]
squeue displays all submitted jobs.
$ squeue
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
   22 drcluster hostname    drc14 PD       0:00      3 (PartitionConfig)
A list of state codes can be found HERE.
srun allows you to run parallel jobs directly from the command line. See sbatch for command line arguments.
$ srun --nodes=3 hostname
node03
node02
node04
To run an interactive srun session:
$ srun --pty /bin/bash