GPU allocations problems

Hi all,

I start working with OpenPBS with GPU allocations. Resources configured with Cgroup.I have 4 GPUs on my server.

The problem is PBS allocates only GPU 1 and GPU 3. It does not allow more than 4 GPU processes to run in parallel but the allocations (using environment variable ‘CUDA_VISIBLE_DEVICES’) are not allocated well.

e.g if I run 5 jobs with ngpus=1, PBS allocates GPU 1,GPU 3,GPU 1, GPU 3 and then waits for one of the jobs to end before he invokes the fifth job with the available GPU.

btw, if I run with ngpus=2 or 4 it allocates well.

Thank you in advance