Hi all,
I start working with OpenPBS with GPU allocations. Resources configured with Cgroup.I have 4 GPUs on my server.
The problem is PBS allocates only GPU 1 and GPU 3. It does not allow more than 4 GPU processes to run in parallel but the allocations (using environment variable ‘CUDA_VISIBLE_DEVICES’) are not allocated well.
e.g if I run 5 jobs with ngpus=1, PBS allocates GPU 1,GPU 3,GPU 1, GPU 3 and then waits for one of the jobs to end before he invokes the fifth job with the available GPU.
btw, if I run with ngpus=2 or 4 it allocates well.
Thank you in advance