Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why the SLURM doesn't provide the nodes as is requested?

My situation is the cluster consisted of 3 PCs (Raspbian with slurm 18), all connected together with shared file storage, mounted as /storage.

The task file is /storage/multiple_hello.sh:

#!/bin/bash
#SBATCH --ntasks-per-node=1
#SBATCH --nodes=3 
#SBATCH --ntasks=3
cd  /storage
srun echo "Hello World from $(hostname)" >> ./"$SLURM_JOB_ID"_$(hostname).txt

It is ran as sbatch /storage/multiple_hello.sh and the expected outcome is creating in /storage 3 files named 120_node1.txt, 121_node2.txt and 122_node3.txt (arbitrary job numbers) since:

  • 3 nodes were requested
  • 3 tasks were requested
  • there was set a limitation for 1 node per task

Real output: created one file only: 120_node1.txt

How to make it work as intended?

Weird enoughh, the srun --nodes=3 hostname works as expected, and returns:

node1
node2
node3
like image 287
Araneus0390 Avatar asked Jan 23 '26 07:01

Araneus0390


1 Answers

To get the expected result, modify the last line as

srun bash -c 'echo "Hello World from $(hostname)" >> ./"$SLURM_JOB_ID"_$(hostname).txt'

The way Bash parses the line is different from what you are expecting. First, $hostname and $SLURM_JOBID are expanded on the first node of the allocation (the one that runs the submission script), then srun is run, and its output is appended to the file. You need to be specific that the redirection >> is part of what you want srun to do. With the above solution, the variable and command expansions are done on each node, as well as the redirection.

like image 57
damienfrancois Avatar answered Jan 25 '26 04:01

damienfrancois



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!