Given that a single node has multiple GPUs, is there a way to automatically limit CPU and memory usage depending on the number of GPUs requested?
In particular, if the users job script requests 2 GPUs then the job should automatically be restricted to 2*BaseMEM
and 2*BaseCPU
, where BaseMEM = TotalMEM/numGPUs
and BaseCPU=numCPUs/numGPUs
, which would be defined on a per node basis.
Is it possible to configure SLURM this way? If not, can one alternatively "virtually" split a multi-GPU machine into multiple nodes with the appropriate CPU and MEM count?
On the command line
--cpus-per-gpu $BaseCPU --mem-per-gpu $BaseMEM
In slurm.conf
DefMemPerGPU=1234
DefCpuPerGPU=1
Since you can't use variables in slurm.conf, you would need to write a little bash command to calculate $BaseCPU and $BaseMEM
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With