linux:slurm
Table of Contents
SLURM (Simple Linux Utility for Resource Management)
Examples of scripts
Basic commands
- sbatch run_script is used to submit a job script for later execution. The script will typically contain one or more srun commands to launch parallel tasks.
- scancel job_id is used to cancel a pending or running job or job step. It can also be used to send an arbitrary signal to all processes associated with a running job or job step.
- squeue reports the state of jobs or job steps. It has a wide variety of filtering, sorting, and formatting options. By default, it reports the running jobs in priority order and then the pending jobs in priority order.
More info see here
Squeue tips
How to see path to job folder
add alias to your ~/.bashrc:
alias qp="squeue -o '%o' | awk -F / '{\$(NF--)} {gsub(\" \",FS)}; \$0=\"cd \"\$0 '"
Output
cd /home/a.dembitskiy/project_template/NaGPO4F/phonons_minimum_121 cd /home/a.boev/vasp/surseg_tem//polaron_seg//LCO.104.7.is.Ti.o_coord.1ULC_g
How to see the details of all the nodes you can use
scontrol show node
Output
NodeName=node-amg01 Arch=x86_64 CoresPerSocket=8 CPUAlloc=4 CPUTot=16 CPULoad=9.63 AvailableFeatures=CEST,sm,e5-2630,haswell,hdd ActiveFeatures=CEST,sm,e5-2630,haswell,hdd Gres=(null) NodeAddr=node-amg01 NodeHostName=node-amg01 Version=18.08 OS=Linux 3.10.0-862.14.4.el7.x86_64 #1 SMP Fri Sep 21 09:07:21 UTC 2018 RealMemory=122880 AllocMem=8192 FreeMem=119484 Sockets=2 Boards=1 State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=50 Owner=N/A MCS_label=N/A Partitions=AMG,AMG-medium,AMG-long,AMG-short BootTime=2019-10-08T13:00:43 SlurmdStartTime=2021-02-15T16:33:51 CfgTRES=cpu=16,mem=120G,billing=16 AllocTRES=cpu=4,mem=8G CapWatts=n/a CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
How to see jobs' info including number of nodes and cores
You can use the format mark %C, for instance:
squeue -o"%.7i %.9P %.8j %.8u %.2t %.10M %.6D %C"
Output
JOBID PARTITION NAME USER ST TIME NODES CPUS 197736 AMG-long clc v.logvin PD 0:00 1 1 197737 AMG-long clc v.logvin PD 0:00 1 1 197735 AMG-long clc v.logvin R 3:50 1 1 197734 AMG-long clc v.logvin R 40:40 1 1 197732 AMG-long clc v.logvin R 47:56 1 1 197696 AMG-mediu heat_100 a.dembit R 5:55:25 1 8 197695 AMG-mediu heat_800 a.dembit R 5:58:56 1 8 197697 AMG-mediu heat_120 a.dembit R 5:38:49 1 8 197739 AMG-mediu heat_600 a.dembit R 12:24 1 16 197675 AMG-long Na2Fe2C6 o.kovaly R 10:40:39 1 8 197738 AMG-long Fe2C6N6. o.kovaly R 14:51 1 8 197667 AMG-mediu lvp.010. a.burov R 18:18:39 1 8 197666 AMG-mediu lvp.010. a.burov R 1-21:55:06 1 8 197731 AMG-mediu llzo_int a.burov R 1:49:13 1 8 197730 AMG-mediu llzo_int a.burov R 1:49:34 1 8 197663 AMG-long lvp.na.o a.burov R 1-21:58:36 1 16
linux/slurm.txt · Last modified: 2025/02/11 15:03 by d.aksenov