Use case for setting CUDA_VISIBLE_DEVICES in cgroups hook

dtalcott · July 27, 2020, 6:59pm

What is the use case for setting the CUDA_VISIBLE_DEVICES environment variable in the pbs_cgroups hook? That is, when you have a multi-node MPI type job, the CUDA_VISIBLE_DEVICES variable from the first node gets propagated to the other nodes, where it makes no sense and causes failures.

So far in our testing, we have had no issues not setting the variable at all. The Nvidia routines and CUDA applications appear to run okay and on the GPUs enabled by the device cgroup for each node. Is there some use case we are missing?

Thanks.

Topic		Replies	Views
Trying to get CUDA_VISIBLE DEVICES set with hook Users/Site Administrators	8	4120	September 24, 2018
GPU Access Limited by CGroup Users/Site Administrators	14	8342	June 13, 2018
Any updates on GPU support since 2010? Users/Site Administrators	4	1905	July 17, 2016
Restrict access to specific gpu or subset Users/Site Administrators	1	478	July 3, 2023
How to configure GPU resource within PBSPro Users/Site Administrators	13	11001	January 7, 2020

Use case for setting CUDA_VISIBLE_DEVICES in cgroups hook

Related topics