How get allocated gpus on each nodes

Hello!

In pbs script - how can I find out how much gpus is allocated on each node? Is there an analog of torque’s PBS_GPUFILE?

I need run distributed DL with horovod framework, and it needs file with nodes and numbers of gpus
like this:
node1 slots=1
node2 slots=2

Thanks.

Hi,

Please submit a multi-node job as below

qsub -l select=2:ncpus=2:mpiprocs=2:ngpus=1 -l place=scatter -I < the last -I is capital I for ice cream and press return, then you will get a console on the remote node , if job runs>
echo $PBS_NODEFILE
cat $PBS_NODEFILE
#you can copy the contents of this file to another file and use it for horovod framework

qsub -l select=1:ncpus=1:ngpus=1+1:ncpus=1:ngpus=2 -I < same as above >
echo $PBS_NODEFILE
cat $PBS_NODEFILE
#you can play with the contents and create a format required for horovod framework

Hope this helps, otherwise please share your script

Hi, and thank you, for answer!
It is very useful, but I already know about $PBS_NODEFILE variable. This var contains only nodes.
I looking for approach for explicitly get list of allocated GPUs.

You might need to write a script to extract information you need from the below command output

  1. pbsnodes -aSjv
  2. pbsnodes -av | grep -e Mom -e resources_available.ngpus -e resources_assigned.ngpus

Thank you

Please download the PBSPro 2020.1 guides (since it contains the most recent documentation of the cgroup hook) and the cgroup hook on the OpenPBS GitHub repository. That allows you to discover on which sockets the GPUs are, make vnodes for each socket (by enabling vnode_per_numa_node), publish how many GPUs there are; the hook will assign GPUs and will also populate CUDA_VISIBLE_DEVICES for each process (even differently for different hosts) for processes spawned via the Task Management API (i.e. spawned by MoM). Other processes will be able to read a “.env” file in the same directory as the nodefile to see the setting.

It is even possible to ensure that you have device isolation, i.e. that any process attached to the cgroups for the job only sees the relevant GPUs (i.e. even nvidia-smi will not see the “wrong” GPUs). It’s a bit tricky to set up because you have to list all the other devices the job may also need that are NOT the GPUs.

Note: there’s a typo in the documentation; vntype files (should you use them to do different things on different node types) are not in $PBS_HOME but $PBS_HOME/mom_priv.

1 Like

Thank you very much for information, it really very useful.
But I work with PBSPro ver. 14.1. Can I use hooks from repo in PBSPro ver. 14.1?

The current hook in “master” was, iirc, meant to be backward compatible (even though it only became officially part of PBSPro OSS in 18.x).

In its current state it should be compatible with older Python 2.x versions except for a few constructs like the syntax for naming exceptions (which is compatible only with Python 2.7 and later).

Even if you build against a Python 2.5-based PBSPro version fixing these is going to be easier than fixing all the corner cases in older hooks.

IIRC, you may also need to remove support for some hook events that were added after 14.x was released.

Now I try run job with:

#PBS -l select=vnode=node1[1]:ngpus=1+vnode=node2[0]:ngpus=1+vnode=node2[1]:ngpus=1+vnode=node3[0]:ngpus=1+vnode=node3[1]:ngpus=1+vnode=node4[1]:ngpus=1+vnode=node5[1]:ngpus=1

and job successfully done.

But if i run job with:

#PBS -l select=7:ngpus=1

error appears (segmentation fault).

In both cases the same hosts was assigned in $PBS_NODEFILE.

Are these two commands equivalent?

(On node1 already running 1 job)

If vnode has status “job busy”, but GPU is free, it is possible (with cgroups?) run job on this GPU?

What daemon gives a “segmentation fault”? If you use gdb to get a backtrace, where is the segmentation fault?

Why would you want to do that? If it’s job busy that means there are no CPUs local to the vnode that the GPU is on. That means it’s impossible to run a GPU job with a decent speed.