GPU Access Limited by CGroup

Phhere · October 27, 2017, 12:38pm

Hello,
we have some Servers with 10 GPUs (GTX 1080 Ti) and a custom resource ngpus. Now we would like to use a Hook to limit the Jobs to only see their GPUs and not all GPUs in the System.
I hoped that it would work with the Cgroup hook but i did not found an option to do it.
Do you have any idea?

Kind regards
Philipp Rehs

pcebull · November 6, 2017, 11:05pm

Did you set up each GPU in its own vnode (i.e., 10 vnodes, plus the natural vnode) as described in the Admin Guide under “Configuring PBS for Advanced GPU Scheduling?” I would assume the Cgroup hook would just handle the GPUs properly in that case, but I’ve not done it myself. We recently purchased a server with mutiple GPUs, so I’m just now starting to look into the issue you describe.

JohnH · November 7, 2017, 6:10am

Philipp, I have severs with 4 GPUs. The cgroups hook is installed and
being used.
When a user runs a job they are assigned GPUs() and the
CUDA_VISIBLE_DEVICES environment variable is set.
() ie the /dev/nvidiaN devices are put into the devices cgroup for the
relevant job

What happens in your case?
Would you share your JSON configuration file for the cgroups hook please?

Also are you using the latest cgroups hook?

Phhere · November 7, 2017, 3:55pm

Thank you for both of your answers.

Until now we just had Basic GPU Scheduling but I will switch to Advanced now.

I do not have a json configuration yet. @JohnH could you share your configuration?

Kind regards
Philipp

JohnH · November 7, 2017, 4:14pm

Philipp, you should have a JSON file which configures the cgroups hook.
Please read this carefully - I had to read it very carefully myself
https://pbspro.atlassian.net/wiki/spaces/PD/pages/11599882/PP-325+Support+Cgroups

So you need to enable the devices stanza, set enabled to true

mic/scif refers to Intel Xeon Phi, so you can safely ignore that.

It is up to you how you handle the lines which exclude vntypes or
run_only_on_hosts
If you have just afew GPU equipped hosts then set the list of hosts in the
line run_only_on_hosts

Phhere · November 7, 2017, 4:22pm

I think i have read the config file correctly but currently i am failing at importing the hook:

[root@hpc-batch14 cgroup]# qmgr
Max open servers: 49
Qmgr: c hook cgroup
Qmgr: import hook cgroup application/x-python default pbs_cgroups.PY
Qmgr: s hook cgroup event=exechost_periodic,exechost_startup,execjob_attach,execjob_begin,execjob_end,execjob_epilogue,execjob_launch
qmgr: Syntax error

Mir current Config looks like this:

{
    "cgroup_prefix"         : "pbspro",
    "exclude_hosts"         : [],
    "exclude_vntypes"       : [],
    "run_only_on_hosts"     : ["hilbert210","hilbert211","hilbert212","hilbert213"],
    "periodic_resc_update"  : true,
    "vnode_per_numa_node"   : false,
    "online_offlined_nodes" : true,
    "use_hyperthreads"      : false,
    "cgroup" : {
        "cpuacct" : {
            "enabled"         : true,
            "exclude_hosts"   : [],
            "exclude_vntypes" : []
        },
        "cpuset" : {
            "enabled"         : true,
            "exclude_hosts"   : [],
            "exclude_vntypes" : []
        },
        "devices" : {
            "enabled"         : true,
            "exclude_hosts"   : [],
            "exclude_vntypes" : [],
            "allow"           : [
                "b *:* rwm",
                "c *:* rwm",
                ["nvidiactl", "rwm", "*"],
                ["nvidia-uvm", "rwm"]
            ]
        },
        "hugetlb" : {
            "enabled"         : false,
            "exclude_hosts"   : [],
            "exclude_vntypes" : [],
            "default"         : "0MB",
            "reserve_percent" : "0",
            "reserve_amount"  : "0MB"
        },
        "memory" : {
            "enabled"         : true,
            "exclude_hosts"   : [],
            "exclude_vntypes" : [],
            "soft_limit"      : false,
            "default"         : "256MB",
            "reserve_percent" : "0",
            "reserve_amount"  : "10GB"
        },
        "memsw" : {
            "enabled"         : false,
            "exclude_hosts"   : [],
            "exclude_vntypes" : ["grey_node"],
            "default"         : "256MB",
            "reserve_percent" : "0",
            "reserve_amount"  : "1GB"
        }
    }
}

JohnH · November 7, 2017, 4:34pm

Philipp,
what does qmgr -c “list hook cgroup” tell you?

pcebull · November 7, 2017, 4:37pm

Philipp,

I think you need to surround the multiple event types with quotes. That’s the cause of your syntax error.

Qmgr: s hook cgroup event=“exechost_periodic,exechost_startup,execjob_attach,execjob_begin,execjob_end,execjob_epilogue,execjob_launch”

Peter

Phhere · November 10, 2017, 3:03pm

Now importing worked, thank you!

But i think my hook configuration is still wrong, it does not setup vnodes and i can not start jobs. I can see that it creates two numa nodes but does not add any gpus to it.

NUMA nodes: {0: {'MemTotal': '134106580k', 'cpus': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 'devices': [], 'HugePages_Total': '0'}, 1: {'MemTotal': '134217728k', 'cpus': [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], 'devices': [], 'HugePages_Total': '0'}}

But it finds these gpus

11/10/2017 16:00:03;0800;pbs_python;Hook;pbs_python;__discover_meminfo: Method called
11/10/2017 16:00:03;0800;pbs_python;Hook;pbs_python;Discover meminfo: {'SwapTotal': '0', 'MemTotal': '264044644k', 'HugePages_Rsvd': 0, 'Hugepagesize': '2048k', 'HugePages_Total': 0}
11/10/2017 16:00:03;0800;pbs_python;Hook;pbs_python;__discover_numa_nodes: Method called
11/10/2017 16:00:03;0800;pbs_python;Hook;pbs_python;__discover_numa_nodes: {0: {'MemTotal': '134106580k', 'cpus': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 'devices': [], 'HugePages_Total': '0'}, 1: {'MemTotal': '134217728k', 'cpus': [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], 'devices': [], 'HugePages_Total': '0'}}
11/10/2017 16:00:03;0800;pbs_python;Hook;pbs_python;__discover_devices: Method called
11/10/2017 16:00:03;0800;pbs_python;Hook;pbs_python;__discover_gpus: Method called
11/10/2017 16:00:03;0800;pbs_python;Hook;pbs_python;NVIDIA SMI command: ['/usr/bin/nvidia-smi', '-q', '-x']
11/10/2017 16:00:04;0800;pbs_python;Hook;pbs_python;root.tag: nvidia_smi_log
11/10/2017 16:00:04;0100;pbs_python;Hook;pbs_python;GPUs: {'nvidia4': '00000000:08:00.0', 'nvidia5': '00000000:0B:00.0', 'nvidia6': '00000000:0C:00.0', 'nvidia7': '00000000:0D:00.0', 'nvidia0': '00000000:04:00.0', 'nvidia1': '00000000:05:00.0', 'nvidia2': '00000000:06:00.0', 'nvidia3': '00000000:07:00.0', 'nvidia8': '00000000:0E:00.0', 'nvidia9': '00000000:0F:00.0'}

Any ideas?

Phhere · November 13, 2017, 9:47am

Hello,
i found the problem We use Supermicro Server with an onboard pci switch, so the pci ids are “wrong”.

GPUs: {'nvidia4': '00000000:08:00.0', 'nvidia5': '00000000:0B:00.0', 'nvidia6': '00000000:0C:00.0', 'nvidia7': '00000000:0D:00.0', 'nvidia0': '00000000:04:00.0', 'nvidia1': '00000000:05:00.0', 'nvidia2': '00000000:06:00.0', 'nvidia3': '00000000:07:00.0', 'nvidia8': '00000000:0E:00.0', 'nvidia9': '00000000:0F:00.0'}

But the devices have other ids on the pci dive list:
card3': {'realpath': '/sys/devices/pci0000:00/0000:00:02.0/0000:02:00.0/0000:03:0c.0/0000:06:00.0/drm/card3', 'major': 226, 'type': 'c', 'numa_node': 0, 'device': '/dev/dri/card3', 'bus_id': '0000:00:02.0', 'minor': 3}, 'card2': {'realpath': '/sys/devices/pci0000:00/0000:00:02.0/0000:02:00.0/0000:03:08.0/0000:05:00.0/drm/card2', 'major': 226, 'type': 'c', 'numa_node': 0, 'device': '/dev/dri/card2', 'bus_id': '0000:00:02.0', 'minor': 2}, 'card1': {'realpath': '/sys/devices/pci0000:00/0000:00:02.0/0000:02:00.0/0000:03:04.0/0000:04:00.0/drm/card1', 'major': 226, 'type': 'c', 'numa_node': 0, 'device': '/dev/dri/card1', 'bus_id': '0000:00:02.0', 'minor': 1}, 'card0': {'realpath': '/sys/devices/pci0000:00/0000:00:1c.7/0000:11:00.0/0000:12:00.0/drm/card0', 'major': 226, 'type': 'c', 'numa_node': 0, 'device': '/dev/dri/card0', 'bus_id': '0000:00:1c.7', 'minor': 0}, 'card7': {'realpath': '/sys/devices/pci0000:00/0000:00:03.0/0000:09:00.0/0000:0a:08.0/0000:0c:00.0/drm/card7', 'major': 226, 'type': 'c', 'numa_node': 0, 'device': '/dev/dri/card7', 'bus_id': '0000:00:03.0', 'minor': 7}, 'card6': {'realpath': '/sys/devices/pci0000:00/0000:00:03.0/0000:09:00.0/0000:0a:04.0/0000:0b:00.0/drm/card6', 'major': 226, 'type': 'c', 'numa_node': 0, 'device': '/dev/dri/card6', 'bus_id': '0000:00:03.0', 'minor': 6}, 'card5': {'realpath': '/sys/devices/pci0000:00/0000:00:02.0/0000:02:00.0/0000:03:14.0/0000:08:00.0/drm/card5', 'major': 226, 'type': 'c', 'numa_node': 0, 'device': '/dev/dri/card5', 'bus_id': '0000:00:02.0', 'minor': 5}, 'card4': {'realpath': '/sys/devices/pci0000:00/0000:00:02.0/0000:02:00.0/0000:03:10.0/0000:07:00.0/drm/card4', 'major': 226, 'type': 'c', 'numa_node': 0, 'device': '/dev/dri/card4', 'bus_id': '0000:00:02.0', 'minor': 4}, 'card10': {'realpath': '/sys/devices/pci0000:00/0000:00:03.0/0000:09:00.0/0000:0a:14.0/0000:0f:00.0/drm/card10', 'major': 226, 'type': 'c', 'numa_node': 0, 'device': '/dev/dri/card10', 'bus_id': '0000:00:03.0', 'minor': 10}, 'card8': {'realpath': '/sys/devices/pci0000:00/0000:00:03.0/0000:09:00.0/0000:0a:0c.0/0000:0d:00.0/drm/card8', 'major': 226, 'type': 'c', 'numa_node': 0, 'device': '/dev/dri/card8', 'bus_id': '0000:00:03.0', 'minor': 8},

Maybe it is a bug in nvidia-smi but i think i need to find a workaround inside the hook

mkaro · November 13, 2017, 3:17pm

Hello Phillip,

Thank you for bringing this to our attention. I’m attending SC17 this week, but will try to have a look at the problem you described in my “free time” here at the conference. Please pardon the inevitable (albeit brief) delay in addressing this.

Thanks,

Mike

Phhere · November 13, 2017, 3:19pm

Hello,
yes i know that is it SC17 and i would like to be there too.
We have talked at this years ISC Maybe i can send you the output of nvidia-smi from our board and some other information / logs.

Phil

mkaro · November 13, 2017, 3:22pm

That would be very helpful! It’s difficult to support hardware that’s not readily available (to me, at least). We could definitely use your help.

Feel free to post to the community or send email with the output.

Thanks,

Mike

Phhere · November 13, 2017, 3:43pm

I hope i have added all required information.

It seems like that the filed bus_id in devices is not correctly matched against the gpu ids from nvidia-smi.
Which devices should be matched at this point? drm/card? Because /dev/nvidia have major 195 and drm/card 226.

mkaro · June 13, 2018, 1:14am

Hello again @Phhere. Many improvements have gone into the cgroups hook since your last post, including many improvements (and a great deal of testing and validation) to areas pertaining to GPUs. Unfortunately, the changes didn’t make it in time for the 18.1 release cutoff, but you can still install 18.1 and grab the cgroup hook bits from mainline here: https://github.com/PBSPro/pbspro/tree/master/src/hooks/cgroups

You can then import the latest hook and check out some of the additional parameters available in the hook configuration file (pbs_cgroups.CF). Please ensure all of the events configured in pbs_cgroups.HK are enabled. Those three files are all you need to care about. Please let us know if we can be of assistance.

Topic		Replies	Views
Specify which GPU to be used in vnode Users/Site Administrators	7	987	July 23, 2021
Trying to get CUDA_VISIBLE DEVICES set with hook Users/Site Administrators	8	4140	September 24, 2018
How to configure GPU resource within PBSPro Users/Site Administrators	13	11144	January 7, 2020
Cgroup error causing suspended jobs Users/Site Administrators	17	3991	October 18, 2018
GPU memory as a custom resource Users/Site Administrators	6	3116	January 15, 2018

GPU Access Limited by CGroup

Related topics