GPU mapping in PBS

berceanu · June 27, 2023, 8:33pm

I am running an MPI + CUDA HPC code on a system using Open PBS with multiple nodes, each node having 8 NVIDIA GPUs.

On a SLURM cluster, I can use “pinning” in order to assign the MPI
rank to the CPUs that are closest to each of the GPUs.

For simplicity, let’s assume we have a node with 4 GPUs and 16 CPUs (or cores), and we want to pin 4 MPI tasks such that each task is associated with one GPU and 4 cores that are closest to it. Here’s a simplified version of how I might go about doing it:

#SBATCH --nodes=1
#SBATCH --ntasks=4
#SBATCH --cpus-per-task=4
#SBATCH --gres=gpu:4
#SBATCH --cpu-bind=cores
#SBATCH --gpu-bind=map_gpu:0,1,2,3
mpirun ./my_application

An alternative approach (which however does not minimize CPU-GPU latency) is to use CUDA_VISIBLE_DEVICES, like so:

#!/bin/bash
#SBATCH --ntasks=4
#SBATCH --gres=gpu:4

mpirun -np 4 -x CUDA_VISIBLE_DEVICES=$SLURM_LOCALID ./my_application

How can I do this using PBS for maximising the performance of the code?

alexis.cousein · July 3, 2023, 11:11pm

Use the cgroup hook and enable vnode_per_numa_node, it will make the scheduler aware of the topology. But if you’re spanning more than one socket then you still have to discover what process to pin where and which GPU to use from that process.

berceanu · July 4, 2023, 9:17am

Thank you for the information!
There are indeed 2 sockets per node, 4 GPUs per socket. Do you think you could maybe provide some sample code to make this clearer?

Topic		Replies	Views
Map gpu resources in pbs Users/Site Administrators	1	271	April 10, 2024
How to configure GPU resource within PBSPro Users/Site Administrators	13	11158	January 7, 2020
How get allocated gpus on each nodes Users/Site Administrators	11	2878	November 2, 2020
Any updates on GPU support since 2010? Users/Site Administrators	4	1914	July 17, 2016
Specify which GPU to be used in vnode Users/Site Administrators	7	989	July 23, 2021

GPU mapping in PBS

Related topics