What is the use case for setting the CUDA_VISIBLE_DEVICES environment variable in the pbs_cgroups hook? That is, when you have a multi-node MPI type job, the CUDA_VISIBLE_DEVICES variable from the first node gets propagated to the other nodes, where it makes no sense and causes failures.
So far in our testing, we have had no issues not setting the variable at all. The Nvidia routines and CUDA applications appear to run okay and on the GPUs enabled by the device cgroup for each node. Is there some use case we are missing?
Thanks.