User Tools

Site Tools


linux:slurm

SLURM (Simple Linux Utility for Resource Management)

Basic commands

  1. sbatch run_script is used to submit a job script for later execution. The script will typically contain one or more srun commands to launch parallel tasks.
  2. scancel job_id is used to cancel a pending or running job or job step. It can also be used to send an arbitrary signal to all processes associated with a running job or job step.
  3. squeue reports the state of jobs or job steps. It has a wide variety of filtering, sorting, and formatting options. By default, it reports the running jobs in priority order and then the pending jobs in priority order.

More info see here

Squeue tips

How to see path to job folder

add alias to your ~/.bashrc:

alias qp="squeue -o '%o' | awk -F / '{\$(NF--)} {gsub(\" \",FS)};  \$0=\"cd \"\$0 '"

Output

cd /home/a.dembitskiy/project_template/NaGPO4F/phonons_minimum_121
cd /home/a.boev/vasp/surseg_tem//polaron_seg//LCO.104.7.is.Ti.o_coord.1ULC_g

How to see the details of all the nodes you can use

scontrol show node

Output

NodeName=node-amg01 Arch=x86_64 CoresPerSocket=8 
   CPUAlloc=4 CPUTot=16 CPULoad=9.63
   AvailableFeatures=CEST,sm,e5-2630,haswell,hdd
   ActiveFeatures=CEST,sm,e5-2630,haswell,hdd
   Gres=(null)
   NodeAddr=node-amg01 NodeHostName=node-amg01 Version=18.08
   OS=Linux 3.10.0-862.14.4.el7.x86_64 #1 SMP Fri Sep 21 09:07:21 UTC 2018 
   RealMemory=122880 AllocMem=8192 FreeMem=119484 Sockets=2 Boards=1
   State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=50 Owner=N/A MCS_label=N/A
   Partitions=AMG,AMG-medium,AMG-long,AMG-short 
   BootTime=2019-10-08T13:00:43 SlurmdStartTime=2021-02-15T16:33:51
   CfgTRES=cpu=16,mem=120G,billing=16
   AllocTRES=cpu=4,mem=8G
   CapWatts=n/a
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

How to see jobs' info including number of nodes and cores

You can use the format mark %C, for instance:

squeue -o"%.7i %.9P %.8j %.8u %.2t %.10M %.6D %C"

Output

  JOBID PARTITION     NAME     USER ST       TIME  NODES CPUS
 197736  AMG-long      clc v.logvin PD       0:00      1 1
 197737  AMG-long      clc v.logvin PD       0:00      1 1
 197735  AMG-long      clc v.logvin  R       3:50      1 1
 197734  AMG-long      clc v.logvin  R      40:40      1 1
 197732  AMG-long      clc v.logvin  R      47:56      1 1
 197696 AMG-mediu heat_100 a.dembit  R    5:55:25      1 8
 197695 AMG-mediu heat_800 a.dembit  R    5:58:56      1 8
 197697 AMG-mediu heat_120 a.dembit  R    5:38:49      1 8
 197739 AMG-mediu heat_600 a.dembit  R      12:24      1 16
 197675  AMG-long Na2Fe2C6 o.kovaly  R   10:40:39      1 8
 197738  AMG-long Fe2C6N6. o.kovaly  R      14:51      1 8
 197667 AMG-mediu lvp.010.  a.burov  R   18:18:39      1 8
 197666 AMG-mediu lvp.010.  a.burov  R 1-21:55:06      1 8
 197731 AMG-mediu llzo_int  a.burov  R    1:49:13      1 8
 197730 AMG-mediu llzo_int  a.burov  R    1:49:34      1 8
 197663  AMG-long lvp.na.o  a.burov  R 1-21:58:36      1 16
linux/slurm.txt · Last modified: 2023/12/20 18:47 by admin

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki