You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: blog/2025-11-01-slurm.md
+256-3Lines changed: 256 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ tags: [slurm, cloud-computing]
6
6
---
7
7
8
8
:::info
9
-
If you are using the BASIC server, you probabily don't have to care about slurm. However, if you would like to use the resources from the NCHC, you should definetely know how to use it.
9
+
If you're only working on the BASIC Lab server, Slurm might not be necessary yet. However, if you plan to use NCHC resources, then learning Slurm is a must. All NCHC clusters are managed through Slurm.
10
10
:::
11
11
12
12
## Introduction
@@ -453,7 +453,7 @@ Job finished at: Sat Nov 01 10:31:00 2025
453
453
454
454
### Understanding the Job Script Options
455
455
456
-
Let's break down what those #SBATCH lines actually mean:
456
+
Let's break down what those `#SBATCH` lines actually mean:
457
457
458
458
*`--job-name`: Give your job a memorable name (shows up in `squeue`)
459
459
*`--output`: Where to save standard output (`%x` = job name, `%j` = job ID)
@@ -556,6 +556,259 @@ We've all been there. You submit a job, feeling confident, and then... it fails
556
556
Don't panic if you see messages in the `*.err` file! Despite the name, not everything in the error file is actually an error. Many programs print normal informational messages, warnings, and progress updates to stderr, which ends up in your `*.err` file. Meanwhile, your `*.out` file might be empty or only contain your explicit `echo` statements. Therefore, always check BOTH files - `*.out` AND `*.err` - to get the full picture of what your job is doing. The `*.err` file often contains the most useful information, even when everything is working perfectly fine.
557
557
:::
558
558
559
+
## Wrapper Script for Dynamic Job Submission
560
+
561
+
Here's a common frustration: Slurm job scripts don't take command-line arguments in the way you might expect. You can't just do`sbatch my_job.sh --learning-rate 0.001` and have it work.
562
+
563
+
**The problem**: You want to run the same experiment with different hyperparameters, seeds, or datasets, but you don't want to manually edit your job script 20 times or create 20 different files.
564
+
**The solution**: Create a bash wrapper script that takes arguments and generates + submits the Slurm job for you!
565
+
566
+
### Example
567
+
568
+
Let's say you want to run training jobs with different learning rates and batch sizes. Here's a wrapper script:
The wrapper creates temporary job scripts that are deleted after submission. If you want to keep them for debugging, you can save them to a directory instead:
0 commit comments