Specify which GPU to be used in vnode

Ans · July 22, 2021, 6:32am

Hi All,

I am having 2 GPUs (Titan and V100) in my system with PBS 19 and CentOS 7 installed.

So i have created 2 vnodes, 1 vnode with V100 and to be part of gpuq and another vnode to be part of cpuq. But when we run the job in the gpuq, the jobs are using both the gpus.

So i have used CUDA_VISIBLE_DEVICES=1 in the job submission script and everything is working fine.

How can we update this conf to the vnode creation it self so that users will get rid of setting the cuda option in the script.

Can any one help me on this.

Thanks,
Ans.

adarsh · July 22, 2021, 7:36am

Please check
5.14.7.2 Advanced GPU Scheduling from this https://www.altair.com/pdfs/pbsworks/PBSAdminGuide2021.1.pdf

Ans · July 22, 2021, 9:51am

Hi Adarsh,

Thank you for the update.

But after updating the parameters accordingly and when we are trying to run the job we are still observing that the job is trying to utilize both the GPUs.

pbsnodes -v c08[1]
c08[1]
Mom = c08
Port = 15002
pbs_version = 19.1.3
ntype = PBS
state = free
pcpus = 8
resources_available.arch = linux
resources_available.gpu_id = gpu1

In the job log we are able to find the below
On host c08 2 GPUs selected for this run.
Mapping of GPU IDs to the 6 GPU tasks in the 6 ranks on this node:
PP:0,PP:0,PP:0,PP:1,PP:1,PP:1

Can you please have a look into and let us know how can we restrict the gpu0.

Thanks,
Ans.

adarsh · July 22, 2021, 12:29pm

Could you please share the job script that you are trying to run ?
In the above configuration you would need to make a bit more customisation in the Mom hooks , finding out to with gpu_id (which gpu device is being used with nvidia-smi) the job has landed onto, then then set the enviroment variable CUDA_VISIBLE_DEVICE=0 or 1 , based on the ngpus request in qsub select statement
Did you check this 16.5.5.1 Managing GPUs via Cgroups
https://www.altair.com/pdfs/pbsworks/PBSAdminGuide2021.1.pdf

Ans · July 22, 2021, 12:51pm

Hi Adarsh,

Please find the job script as below
#!/bin/bash
#PBS -l vnode=c08[0]
#PBS -N gmx_test
#PBS -q gpuq
#PBS -j oe

cd $PBS_O_WORKDIR
gmx mdrun -ntmpi 6 -ntomp 1 -s new.tpr -g md.log -nb gpu >& log

Also to set the environment variable each time, can we set this according to the queue, to use device 1 for gpuq and nothing for cpuq.

Referring to the Cgroup hook, we are not having any option like “nvidia-smi” present in the file. Can we have any sample referring to the same if possible.

If we implement the hooks now, do we need to delete the vnodes again.

Thanks,
Ans.

adarsh · July 23, 2021, 9:13am

You are directly requesting that specific node in your jobs script.
It is better if we make PBS to choose that specific gpu resource via customer resources.
Also, i do not see you requesting cpu, gpu, mem resources from that system, this means the job can use as much as it wa

To be clear on my understanding:

You have one compute node with two GPU cards

Titan
V100
by any chance you know setting which CUDA_VISIBLE_DEVICE would use V100 and Titan respectively ?
I think we do not have create vnodes , instead we can use it as one natural node and based on the request by the user , set the enviroment variable (Variable_List) in the hook

qsub -l select=1:ncpus=1:ngpus=1 # if this is the request , the queuejob hook event will know that there is a gpu being requested, and hence it will set CUDA_VISIBLE_DEVICE to 0 or 1 ( based on QNo1)

qsub -l select=1:ncpus=1 # if this is the request, then the queuejob hook event will not set any CUDA_VISIBLE_DEVICE enviroment variable

If you can share the above details , then it is easy to handle this requirement.

Ans · July 23, 2021, 12:54pm

Hi Adarsh,

Thank you for the response.

when i issue nvidia-smi then the Titan is 0 and V100 is 1.

As suggested i have run my job using qsub -l select=1:ncpus=1:ngpus=1 but still it is taking both GPUs as below:-
On host c08 2 GPUs selected for this run.
Mapping of GPU IDs to the 6 GPU tasks in the 6 ranks on this node:
PP:0,PP:0,PP:0,PP:1,PP:1,PP:1

but when the CUDA_VISIBLE_DEVICES is used then the job is using only GPU
On host c08 1 GPU selected for this run.
Mapping of GPU IDs to the 6 GPU tasks in the 6 ranks on this node:
PP:0,PP:0,PP:0,PP:0,PP:0,PP:0

Can you please share any of the sample cgroup hook which is working fine.

Thanks,
Ans.

adarsh · July 23, 2021, 8:45pm

Cgroup hook
openpbs/pbs_cgroups.PY at master · openpbs/openpbs · GitHub
Cgroup configuration file:
openpbs/pbs_cgroups.CF at master · openpbs/openpbs · GitHub
Pleasefolow this section 16.5.5.1 Managing GPUs via Cgroups in
https://www.altair.com/pdfs/pbsworks/PBSAdminGuide2021.1.pdf

Topic		Replies	Views
How to configure GPU resource within PBSPro Users/Site Administrators	13	11146	January 7, 2020
Advanced GPU Scheduling Developers	8	78	July 15, 2025
GPU Access Limited by CGroup Users/Site Administrators	14	8400	June 13, 2018
GPU memory as a custom resource Users/Site Administrators	6	3118	January 15, 2018
Any updates on GPU support since 2010? Users/Site Administrators	4	1914	July 17, 2016

Specify which GPU to be used in vnode

Related topics