These samples scripts perform the
following steps:
Loads your bash profile file and the required modules for the computational code.
Creates a scratch directory dedicated to the job that is uniquely
identified by the SLURM job ID and creates a symlink to the scratch
directory for convenience. (This is especially useful if the jobterminates
unexpectedly during execution.)
Copies files to the scratch directory.
Initiates the calculation by running a Python script (presumably, run.py).
Stops the job at 90% of the maximum run time to ensure enough time remains
to copy files from the scratch directory to the submission directory.
Cleans up the scratch directory.
Logs the completion of the job in a file in your home directory ~/job.log.
Additionally, the script prints out debugging information that may be useful
for identifying issues with running jobs (e.g., resource information, job ID,
etc.).
```pytitle="samples/slurm/vasp.sh"linenums="1"hl_lines="96"#!/bin/bash#SBATCH --account=def-samiras#SBATCH --job-name=JOB_NAME#SBATCH --mem-per-cpu=1000MB#SBATCH --nodes=2#SBATCH --ntasks-per-node=24#SBATCH --time=23:00:00#SBATCH --mail-user=SFU_ID@sfu.ca#SBATCH --mail-type=BEGIN,END,FAIL,TIME_LIMIT,TIME_LIMIT_90echo" "echo"### Setting up shell environment ..."echo" "iftest-e"/etc/profile";thensource"/etc/profile"fiiftest-e"$HOME/.bash_profile";thensource"$HOME/.bash_profile"fiunsetLANGmodulepurgemoduleloadvaspmoduleloadpython/3.11.9# Replace "$COMP_CHEM_ENV" with the path to your Python virtual environmentsource"$COMP_CHEM_ENV"exportLC_ALL="C"exportMKL_NUM_THREADS=1exportOMP_NUM_THREADS=1ulimit-sunlimitedecho" "echo"### Printing basic job infos to stdout ..."echo" "echo"START_TIME = $(date '+%y-%m-%d %H:%M:%S %s')"echo"HOSTNAME = ${HOSTNAME}"echo"USER = ${USER}"echo"SLURM_JOB_NAME = ${SLURM_JOB_NAME}"echo"SLURM_JOB_ID = ${SLURM_JOB_ID}"echo"SLURM_SUBMIT_DIR = ${SLURM_SUBMIT_DIR}"echo"SLURM_JOB_NUM_NODES = ${SLURM_JOB_NUM_NODES}"echo"SLURM_NTASKS = ${SLURM_NTASKS}"echo"SLURM_NODELIST = ${SLURM_NODELIST}"echo"SLURM_JOB_NODELIST = ${SLURM_JOB_NODELIST}"iftest-f"${SLURM_JOB_NODELIST}";thenecho"SLURM_JOB_NODELIST (begin) ----------"cat"${SLURM_JOB_NODELIST}"echo"SLURM_JOB_NODELIST (end) ------------"fiecho"--------------- ulimit -a -S ---------------"ulimit-a-Secho"--------------- ulimit -a -H ---------------"ulimit-a-Hecho"----------------------------------------------"echo" "echo"### Creating TMP_WORK_DIR directory and changing to it ..."echo" "iftest-e"$HOME/scratch";thenTMP_WORK_DIR="$HOME/scratch/${SLURM_JOB_ID}"eliftest-e/scratch/"${SLURM_JOB_ID}";thenTMP_WORK_DIR=/scratch/${SLURM_JOB_ID}elseTMP_WORK_DIR="$(pwd)"fiTMP_BASE_DIR="$(dirname "$TMP_WORK_DIR")"JOB_WORK_DIR="$(basename "$TMP_WORK_DIR")"echo"TMP_WORK_DIR = ${TMP_WORK_DIR}"echo"TMP_BASE_DIR = ${TMP_BASE_DIR}"echo"JOB_WORK_DIR = ${JOB_WORK_DIR}"# Creating a symbolic link to temporary directory holding work files while job runningif!test-e"${TMP_WORK_DIR}";thenmkdir"${TMP_WORK_DIR}"filn-s"${TMP_WORK_DIR}"scratch_dircd"${TMP_WORK_DIR}"||exitecho" "echo"### Copying input files for job (if required):"echo" "script_name="${BASH_SOURCE[0]}"AUTOJOB_SLURM_SCRIPT="$(basename "$script_name")"exportAUTOJOB_SLURM_SCRIPTexportAUTOJOB_PYTHON_SCRIPT="{{ python_script }}"exportAUTOJOB_COPY_TO_SCRATCH="CHGCAR,,*py,*cif,POSCAR,coord,*xyz,*.traj,CONTCAR,*.pkl,*xml,WAVECAR"cp-v"$SLURM_SUBMIT_DIR"/{CHGCAR,,*py,*cif,POSCAR,coord,*xyz,*.traj,CONTCAR,*.pkl,*xml,WAVECAR}"$TMP_WORK_DIR"/echo" "# Preemptively end job if getting close to time limittimeline=$(grep-E-m1'^#SBATCH[[:space:]]*--time='"$script_name")timeslurm=${timeline##*=}IFS=-read-raday_split_time<<<"$timeslurm"no_days_time=${day_split_time[1]}days=${no_days_time:+${day_split_time[0]}}no_days_time=${day_split_time[1]:-${day_split_time[0]}}IFS=:read-rasplit_time<<<"$no_days_time"# Time formats with days: D-H, D-H:M, D-H:M:Sif[[$days]];thenslurm_days="$days"slurm_hours=${split_time[0]}slurm_minutes=${split_time[1]:-0}slurm_seconds=${split_time[2]:-0}# Time format without days: M, M:S, H:M:Selseslurm_days=0if[[${#split_time[*]} == 3 ]]; thenslurm_hours=${split_time[0]}slurm_minutes=${split_time[1]}slurm_seconds=${split_time[2]}elseslurm_hours=0slurm_minutes=${split_time[0]}slurm_seconds=${split_time[1]:-0}fifiecho"Running for $(echo "$slurm_days*1" |bc)d $(echo "$slurm_hours*1" |bc)h $(echo "$slurm_minutes*1" |bc)m and $(echo "$slurm_seconds*1" |bc)s."timeslurm=$(echo"$slurm_days*86400 + $slurm_hours*3600 + $slurm_minutes*60 + $slurm_seconds"|bc)echo"This means $timeslurm seconds."timeslurm=$(echo"$timeslurm *0.9"|bc)echo"Will terminate at ${timeslurm}s to copy back necessary files from scratch"echo""echo""# run ase calculation and timetimeout"${timeslurm}"python3"$AUTOJOB_PYTHON_SCRIPT"exit_code=$?if["$exit_code"-eq124];thenecho" "echo"Cancelled due to time limit."elseecho" "echo"Time limit not reached."fiecho" "echo"### Cleaning up files ... removing unnecessary scratch files ..."echo" "AUTOJOB_FILES_TO_DELETE="*.d2e *.int *.rwf *.skr *.inp EIGENVAL IBZKPT PCDAT PROCAR ELFCAR LOCPOT PROOUT TMPCAR vasp.dipcor"rm-vf"$AUTOJOB_FILES_TO_DELETE"sleep10# Sleep some time so potential stale nfs handles can disappear.echo" "echo"### Compressing results and copying back result archive ..."echo" "cd"${TMP_BASE_DIR}"||exitmkdir-vp"${SLURM_SUBMIT_DIR}"# if user has deleted or moved the submit direcho" "echo"Creating result tgz-file '${SLURM_SUBMIT_DIR}/${JOB_WORK_DIR}.tgz' ..."echo" "tar-zcvf"${SLURM_SUBMIT_DIR}/${JOB_WORK_DIR}.tgz""${JOB_WORK_DIR}" \
||{echo"ERROR: Failed to create tgz-file. Please cleanup TMP_WORK_DIR $TMP_WORK_DIR on host '$HOSTNAME' manually (if not done automatically by queueing system).";exit102;}echo" "echo"### Remove TMP_WORK_DIR ..."echo" "rm-rvf"${TMP_WORK_DIR}"echo" "echo"Extracting result tgz-file"echo" "cd"${SLURM_SUBMIT_DIR}"||exittar-xzf"${JOB_WORK_DIR}".tgzmv"${JOB_WORK_DIR}"/*.rm-r"${JOB_WORK_DIR}".tgz"${JOB_WORK_DIR}"rm"${SLURM_SUBMIT_DIR}/scratch_dir"echo"END_TIME = $(date +'%y-%m-%d %H:%M:%S %s')"# Record job in log fileecho"${SLURM_JOB_ID}-${SLURM_JOB_NAME}"iscomplete:on"$(date +'%y.%m.%d %H:%M:%S')""${SLURM_SUBMIT_DIR}">>~/job.logecho" "echo"### Exiting with exit code ${exit_code}..."echo" "exit"$exit_code"
```pytitle="samples/slurm/espresso.sh"linenums="1"#!/bin/bash#SBATCH --account=def-samiras#SBATCH --job-name=JOB_NAME#SBATCH --mem-per-cpu=1000MB#SBATCH --nodes=2#SBATCH --ntasks-per-node=24#SBATCH --time=23:00:00#SBATCH --mail-user=SFU_ID@sfu.ca#SBATCH --mail-type=BEGIN,END,FAIL,TIME_LIMIT,TIME_LIMIT_90echo" "echo"### Setting up shell environment ..."echo" "iftest-e"/etc/profile";thensource"/etc/profile"fiiftest-e"$HOME/.bash_profile";thensource"$HOME/.bash_profile"fiunsetLANGmodule--forcepurgemoduleloadgentoo/2020python/3.11.9espresso# Replace "$COMP_CHEM_ENV" with the path to your Python virtual environmentsource"$COMP_CHEM_ENV"exportLC_ALL="C"exportMKL_NUM_THREADS=1exportOMP_NUM_THREADS=1ulimit-sunlimitedecho" "echo"### Printing basic job infos to stdout ..."echo" "echo"START_TIME = $(date '+%y-%m-%d %H:%M:%S %s')"echo"HOSTNAME = ${HOSTNAME}"echo"USER = ${USER}"echo"SLURM_JOB_NAME = ${SLURM_JOB_NAME}"echo"SLURM_JOB_ID = ${SLURM_JOB_ID}"echo"SLURM_SUBMIT_DIR = ${SLURM_SUBMIT_DIR}"echo"SLURM_JOB_NUM_NODES = ${SLURM_JOB_NUM_NODES}"echo"SLURM_NTASKS = ${SLURM_NTASKS}"echo"SLURM_NODELIST = ${SLURM_NODELIST}"echo"SLURM_JOB_NODELIST = ${SLURM_JOB_NODELIST}"iftest-f"${SLURM_JOB_NODELIST}";thenecho"SLURM_JOB_NODELIST (begin) ----------"cat"${SLURM_JOB_NODELIST}"echo"SLURM_JOB_NODELIST (end) ------------"fiecho"--------------- ulimit -a -S ---------------"ulimit-a-Secho"--------------- ulimit -a -H ---------------"ulimit-a-Hecho"----------------------------------------------"echo" "echo"### Creating TMP_WORK_DIR directory and changing to it ..."echo" "iftest-e"$HOME/scratch";thenTMP_WORK_DIR="$HOME/scratch/${SLURM_JOB_ID}"eliftest-e/scratch/"${SLURM_JOB_ID}";thenTMP_WORK_DIR=/scratch/${SLURM_JOB_ID}elseTMP_WORK_DIR="$(pwd)"fiTMP_BASE_DIR="$(dirname "$TMP_WORK_DIR")"JOB_WORK_DIR="$(basename "$TMP_WORK_DIR")"echo"TMP_WORK_DIR = ${TMP_WORK_DIR}"echo"TMP_BASE_DIR = ${TMP_BASE_DIR}"echo"JOB_WORK_DIR = ${JOB_WORK_DIR}"# Creating a symbolic link to temporary directory holding work files while job runningif!test-e"${TMP_WORK_DIR}";thenmkdir"${TMP_WORK_DIR}"filn-s"${TMP_WORK_DIR}"scratch_dircd"${TMP_WORK_DIR}"||exitecho" "echo"### Copying input files for job (if required):"echo" "script_name="${BASH_SOURCE[0]}"AUTOJOB_SLURM_SCRIPT="$(basename "$script_name")"exportAUTOJOB_SLURM_SCRIPTexportAUTOJOB_PYTHON_SCRIPT="{{ python_script }}"exportAUTOJOB_COPY_TO_SCRATCH="CHGCAR,,*py,*cif,POSCAR,coord,*xyz,*.traj,CONTCAR,*.pkl,*xml,WAVECAR"cp-v"$SLURM_SUBMIT_DIR"/{CHGCAR,*py,*cif,POSCAR,coord,*xyz,*.traj,CONTCAR,*.pkl,*xml,WAVECAR}"$TMP_WORK_DIR"/echo" "# Preemptively end job if getting close to time limittimeline=$(grep-E-m1'^#SBATCH[[:space:]]*--time='"$script_name")timeslurm=${timeline##*=}IFS=-read-raday_split_time<<<"$timeslurm"no_days_time=${day_split_time[1]}days=${no_days_time:+${day_split_time[0]}}no_days_time=${day_split_time[1]:-${day_split_time[0]}}IFS=:read-rasplit_time<<<"$no_days_time"# Time formats with days: D-H, D-H:M, D-H:M:Sif[[$days]];thenslurm_days="$days"slurm_hours=${split_time[0]}slurm_minutes=${split_time[1]:-0}slurm_seconds=${split_time[2]:-0}# Time format without days: M, M:S, H:M:Selseslurm_days=0if[[${#split_time[*]} == 3 ]]; thenslurm_hours=${split_time[0]}slurm_minutes=${split_time[1]}slurm_seconds=${split_time[2]}elseslurm_hours=0slurm_minutes=${split_time[0]}slurm_seconds=${split_time[1]:-0}fifiecho"Running for $(echo "$slurm_days*1" |bc)d $(echo "$slurm_hours*1" |bc)h $(echo "$slurm_minutes*1" |bc)m and $(echo "$slurm_seconds*1" |bc)s."timeslurm=$(echo"$slurm_days*86400 + $slurm_hours*3600 + $slurm_minutes*60 + $slurm_seconds"|bc)echo"This means $timeslurm seconds."timeslurm=$(echo"$timeslurm *0.9"|bc)echo"Will terminate at ${timeslurm}s to copy back necessary files from scratch"echo""echo""# run ase calculation and timetimeout"${timeslurm}"python3"$AUTOJOB_PYTHON_SCRIPT"exit_code=$?if["$exit_code"-eq124];thenecho" "echo"Cancelled due to time limit."elseecho" "echo"Time limit not reached."fiecho" "echo"### Cleaning up files ... removing unnecessary scratch files ..."echo" "AUTOJOB_FILES_TO_DELETE="*.mix* *.wfc*"rm-vf"$AUTOJOB_FILES_TO_DELETE"sleep10# Sleep some time so potential stale nfs handles can disappear.echo" "echo"### Compressing results and copying back result archive ..."echo" "cd"${TMP_BASE_DIR}"||exitmkdir-vp"${SLURM_SUBMIT_DIR}"# if user has deleted or moved the submit direcho" "echo"Creating result tgz-file '${SLURM_SUBMIT_DIR}/${JOB_WORK_DIR}.tgz' ..."echo" "tar-zcvf"${SLURM_SUBMIT_DIR}/${JOB_WORK_DIR}.tgz""${JOB_WORK_DIR}" \
||{echo"ERROR: Failed to create tgz-file. Please cleanup TMP_WORK_DIR $TMP_WORK_DIR on host '$HOSTNAME' manually (if not done automatically by queueing system).";exit102;}echo" "echo"### Remove TMP_WORK_DIR ..."echo" "rm-rvf"${TMP_WORK_DIR}"echo" "echo"Extracting result tgz-file"echo" "cd"${SLURM_SUBMIT_DIR}"||exittar-xzf"${JOB_WORK_DIR}".tgzmv"${JOB_WORK_DIR}"/*.rm-r"${JOB_WORK_DIR}".tgz"${JOB_WORK_DIR}"rm"${SLURM_SUBMIT_DIR}/scratch_dir"echo"END_TIME = $(date +'%y-%m-%d %H:%M:%S %s')"# Record job in log fileecho"${SLURM_JOB_ID}-${SLURM_JOB_NAME}"iscomplete:on"$(date +'%y.%m.%d %H:%M:%S')""${SLURM_SUBMIT_DIR}">>~/job.logecho" "echo"### Exiting with exit code ${exit_code}..."echo" "exit"$exit_code"
Reminder
This script assumes that you are using a self-compiled version of
Quantum Espresso and have created a corresponding module named
espresso. See this tutorial
for how to compile Quantum Espresso and create the necessary
modulefile.
#!/bin/bash#SBATCH --account=def-samiras#SBATCH --job-name=JOB_NAME#SBATCH --mem-per-cpu=1000MB#SBATCH --nodes=2#SBATCH --ntasks-per-node=24#SBATCH --time=23:00:00#SBATCH --mail-user=SFU_ID@sfu.ca#SBATCH --mail-type=BEGIN,END,FAIL,TIME_LIMIT,TIME_LIMIT_90echo" "echo"### Setting up shell environment ..."echo" "iftest-e"/etc/profile";thensource"/etc/profile"fiiftest-e"$HOME/.bash_profile";thensource"$HOME/.bash_profile"fiunsetLANGmodulepurgemoduleloadgaussian/g16.c01moduleloadpython/3.11.9# Replace "$COMP_CHEM_ENV" with the path to your Python virtual environmentsource"$COMP_CHEM_ENV"exportLC_ALL="C"exportMKL_NUM_THREADS=1exportOMP_NUM_THREADS=1ulimit-sunlimitedecho" "echo"### Printing basic job infos to stdout ..."echo" "echo"START_TIME = $(date '+%y-%m-%d %H:%M:%S %s')"echo"HOSTNAME = ${HOSTNAME}"echo"USER = ${USER}"echo"SLURM_JOB_NAME = ${SLURM_JOB_NAME}"echo"SLURM_JOB_ID = ${SLURM_JOB_ID}"echo"SLURM_SUBMIT_DIR = ${SLURM_SUBMIT_DIR}"echo"SLURM_JOB_NUM_NODES = ${SLURM_JOB_NUM_NODES}"echo"SLURM_NTASKS = ${SLURM_NTASKS}"echo"SLURM_NODELIST = ${SLURM_NODELIST}"echo"SLURM_JOB_NODELIST = ${SLURM_JOB_NODELIST}"iftest-f"${SLURM_JOB_NODELIST}";thenecho"SLURM_JOB_NODELIST (begin) ----------"cat"${SLURM_JOB_NODELIST}"echo"SLURM_JOB_NODELIST (end) ------------"fiecho"--------------- ulimit -a -S ---------------"ulimit-a-Secho"--------------- ulimit -a -H ---------------"ulimit-a-Hecho"----------------------------------------------"echo" "echo"### Creating TMP_WORK_DIR directory and changing to it ..."echo" "iftest-e"$HOME/scratch";thenTMP_WORK_DIR="$HOME/scratch/${SLURM_JOB_ID}"eliftest-e/scratch/"${SLURM_JOB_ID}";thenTMP_WORK_DIR=/scratch/${SLURM_JOB_ID}elseTMP_WORK_DIR="$(pwd)"fi# Pass memory request, cpu list, and scratch directory to GaussianexportGAUSS_MDEF="${SLURM_MEM_PER_NODE}MB"GAUSS_CDEF=$(taskset-cp$$|awk-F':''{print $2}')exportGAUSS_CDEFexportGAUSS_SCRDIR=${TMP_WORK_DIR}TMP_BASE_DIR="$(dirname "$TMP_WORK_DIR")"JOB_WORK_DIR="$(basename "$TMP_WORK_DIR")"echo"TMP_WORK_DIR = ${TMP_WORK_DIR}"echo"TMP_BASE_DIR = ${TMP_BASE_DIR}"echo"JOB_WORK_DIR = ${JOB_WORK_DIR}"# Creating a symbolic link to temporary directory holding work files while job runningln-s"${TMP_WORK_DIR}"scratch_dircd"${TMP_WORK_DIR}"||exitecho" "echo"### Copying input files for job (if required):"echo" "script_name="${BASH_SOURCE[0]}"AUTOJOB_SLURM_SCRIPT="$(basename "$script_name")"exportAUTOJOB_SLURM_SCRIPTexportAUTOJOB_PYTHON_SCRIPT="run.py"exportAUTOJOB_COPY_TO_SCRATCH="*.chk,*.py,*.traj,*.rwf"cp-v"$SLURM_SUBMIT_DIR"/{*.chk,*.py,*.traj,*.rwf}"$TMP_WORK_DIR"/echo" "# Preemptively end job if getting close to time limittimeline=$(grep-E-m1'^#SBATCH[[:space:]]*--time='"$script_name")timeslurm=${timeline##*=}IFS=-read-raday_split_time<<<"$timeslurm"no_days_time=${day_split_time[1]}days=${no_days_time:+${day_split_time[0]}}no_days_time=${day_split_time[1]:-${day_split_time[0]}}IFS=:read-rasplit_time<<<"$no_days_time"# Time formats with days: D-H, D-H:M, D-H:M:Sif[[$days]];thenslurm_days="$days"slurm_hours=${split_time[0]}slurm_minutes=${split_time[1]:-0}slurm_seconds=${split_time[2]:-0}# Time format without days: M, M:S, H:M:Selseslurm_days=0if[[${#split_time[*]} == 3 ]]; thenslurm_hours=${split_time[0]}slurm_minutes=${split_time[1]}slurm_seconds=${split_time[2]}elseslurm_hours=0slurm_minutes=${split_time[0]}slurm_seconds=${split_time[1]:-0}fifiecho"Running for $(echo "$slurm_days*1" |bc)d $(echo "$slurm_hours*1" |bc)h $(echo "$slurm_minutes*1" |bc)m and $(echo "$slurm_seconds*1" |bc)s."timeslurm=$(echo"$slurm_days*86400 + $slurm_hours*3600 + $slurm_minutes*60 + $slurm_seconds"|bc)echo"This means $timeslurm seconds."timeslurm=$(echo"$timeslurm *0.9"|bc)echo"Will terminate at ${timeslurm}s to copy back necessary files from scratch"echo""echo""# run ase calculation and timetimeout"${timeslurm}"python3"$AUTOJOB_PYTHON_SCRIPT"exit_code=$?if["$exit_code"-eq124];thenecho" "echo"Cancelled due to time limit."elseecho" "echo"Time limit not reached."fiecho" "echo"### Cleaning up files ... removing unnecessary scratch files ..."echo" "AUTOJOB_FILES_TO_DELETE="*.d2e *.int *.rwf *.skr *.inp"rm-vf"$AUTOJOB_FILES_TO_DELETE"sleep10# Sleep some time so potential stale nfs handles can disappear.echo" "echo"### Compressing results and copying back result archive ..."echo" "cd"${TMP_BASE_DIR}"||exitmkdir-vp"${SLURM_SUBMIT_DIR}"# if user has deleted or moved the submit direcho" "echo"Creating result tgz-file '${SLURM_SUBMIT_DIR}/${JOB_WORK_DIR}.tgz' ..."echo" "tar-zcvf"${SLURM_SUBMIT_DIR}/${JOB_WORK_DIR}.tgz""${JOB_WORK_DIR}" \
||{echo"ERROR: Failed to create tgz-file. Please cleanup TMP_WORK_DIR $TMP_WORK_DIR on host '$HOSTNAME' manually (if not done automatically by queueing system).";exit102;}echo" "echo"### Remove TMP_WORK_DIR ..."echo" "rm-rvf"${TMP_WORK_DIR}"echo" "echo"Extracting result tgz-file"echo" "cd"${SLURM_SUBMIT_DIR}"||exittar-xzf"${JOB_WORK_DIR}".tgzmv"${JOB_WORK_DIR}"/*.rm-r"${JOB_WORK_DIR}".tgz"${JOB_WORK_DIR}"rm"${SLURM_SUBMIT_DIR}/scratch_dir"echo"END_TIME = $(date +'%y-%m-%d %H:%M:%S %s')"# Record job in log fileecho"${SLURM_JOB_ID}-${SLURM_JOB_NAME}"iscomplete:on"$(date +'%y.%m.%d %H:%M:%S')""${SLURM_SUBMIT_DIR}">>~/job.logecho" "echo"### Exiting with exit code ${exit_code}..."echo" "exit"$exit_code"
Edit the brace expansion in line 96 or 101 to change the
files copied to the scratch directory.
Reminder
Don't forget to replace JOB_NAME, SFU_ID, and PYTHON_SCRIPT with
appropriate values in addition to setting your desired SLURM parameters.
Also, if you don't define the path to a Python virtual environment in your
.bashrc file, then you should replace $COMP_CHEM_ENV with the path to
the activate script (usually, path-to-environment/bin/activate).