====== SLURM (Simple Linux Utility for Resource Management) ====== ===== Basic commands ===== - //**sbatch** run_script// is used to submit a job script for later execution. The script will typically contain one or more srun commands to launch parallel tasks. - //**scancel** job_id// is used to cancel a pending or running job or job step. It can also be used to send an arbitrary signal to all processes associated with a running job or job step. - //**squeue**// reports the state of jobs or job steps. It has a wide variety of filtering, sorting, and formatting options. By default, it reports the running jobs in priority order and then the pending jobs in priority order. More info see [[https://slurm.schedmd.com/quickstart.html|here]] ===== Squeue tips ===== === How to see path to job folder === add alias to your ~/.bashrc: alias qp="squeue -o '%o' | awk -F / '{\$(NF--)} {gsub(\" \",FS)}; \$0=\"cd \"\$0 '" Output cd /home/a.dembitskiy/project_template/NaGPO4F/phonons_minimum_121 cd /home/a.boev/vasp/surseg_tem//polaron_seg//LCO.104.7.is.Ti.o_coord.1ULC_g === How to see the details of all the nodes you can use === scontrol show node Output NodeName=node-amg01 Arch=x86_64 CoresPerSocket=8 CPUAlloc=4 CPUTot=16 CPULoad=9.63 AvailableFeatures=CEST,sm,e5-2630,haswell,hdd ActiveFeatures=CEST,sm,e5-2630,haswell,hdd Gres=(null) NodeAddr=node-amg01 NodeHostName=node-amg01 Version=18.08 OS=Linux 3.10.0-862.14.4.el7.x86_64 #1 SMP Fri Sep 21 09:07:21 UTC 2018 RealMemory=122880 AllocMem=8192 FreeMem=119484 Sockets=2 Boards=1 State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=50 Owner=N/A MCS_label=N/A Partitions=AMG,AMG-medium,AMG-long,AMG-short BootTime=2019-10-08T13:00:43 SlurmdStartTime=2021-02-15T16:33:51 CfgTRES=cpu=16,mem=120G,billing=16 AllocTRES=cpu=4,mem=8G CapWatts=n/a CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s === How to see jobs' info including number of nodes and cores === You can use the format mark %C, for instance: squeue -o"%.7i %.9P %.8j %.8u %.2t %.10M %.6D %C" Output JOBID PARTITION NAME USER ST TIME NODES CPUS 197736 AMG-long clc v.logvin PD 0:00 1 1 197737 AMG-long clc v.logvin PD 0:00 1 1 197735 AMG-long clc v.logvin R 3:50 1 1 197734 AMG-long clc v.logvin R 40:40 1 1 197732 AMG-long clc v.logvin R 47:56 1 1 197696 AMG-mediu heat_100 a.dembit R 5:55:25 1 8 197695 AMG-mediu heat_800 a.dembit R 5:58:56 1 8 197697 AMG-mediu heat_120 a.dembit R 5:38:49 1 8 197739 AMG-mediu heat_600 a.dembit R 12:24 1 16 197675 AMG-long Na2Fe2C6 o.kovaly R 10:40:39 1 8 197738 AMG-long Fe2C6N6. o.kovaly R 14:51 1 8 197667 AMG-mediu lvp.010. a.burov R 18:18:39 1 8 197666 AMG-mediu lvp.010. a.burov R 1-21:55:06 1 8 197731 AMG-mediu llzo_int a.burov R 1:49:13 1 8 197730 AMG-mediu llzo_int a.burov R 1:49:34 1 8 197663 AMG-long lvp.na.o a.burov R 1-21:58:36 1 16