CGroups hook: vnode_per_numa_node when NUMA bisects sockets

rhofmann · December 13, 2019, 10:59pm

After backporting cgroups hook from 19.1.2 to 18.1.4 and having great success using it to manage allocation and usage on our nodes, I’ve run into a node with an unusual division of numa nodes that the cgroups hook is having difficulty with.

The node has 32 cores (16 per socket), 4 GPUs, & 64 gb mem. However when enabling vnode_per_numa_node in the cgroups hook config, the node is divided into 2 vnodes with 8 cores, 2 GPUs, & 32 gb mem. As a result an entire socket is just gone - as far as pbs is concerned. This seems to be due to the layout of this node, which I have provided a highly technical diagram of below:

A less technical diagram from $ nvidia-smi topo -m

Has anyone has encountered a node like this? For the sake of using the GPUs efficiently I’m considering leaving these as 8 core vnodes since the hook is managing the reported resources very well. However, I really don’t want to have an unallocatable socket. I’d be grateful for any suggestions.

I can also provide the backported python script for anyone who would like it. I’m not sure what the best way to share it is. The backporting just involved commenting out a handful of lines.

Thanks,

Russell

mkaro · December 17, 2019, 7:52pm

Kudos on the highly technical diagram! Just one question… what value are you using for use_hyperthreads in the hook configuration? It is false by default.

rhofmann · December 17, 2019, 9:13pm

"use_hyperthreads"      : false,
"ncpus_are_cores"       : true,

We do not have hyperthreading enabled on any nodes.

Edit: Actually I am wrong… The node config for this node is enabling hyperthreading which should not have happened… Thank you for pointing this out. I think that solves the issue.

Topic		Replies	Views
Advanced GPU Scheduling Developers	8	47	July 15, 2025
About vnode creation for each socket Users/Site Administrators	6	1559	January 17, 2020
GPU Access Limited by CGroup Users/Site Administrators	14	8383	June 13, 2018
Multiple cgroups per vnode -- realistic use cases? Users/Site Administrators	1	850	January 29, 2022
Remove support for cpuset MoM Developers	14	1398	May 6, 2020

CGroups hook: vnode_per_numa_node when NUMA bisects sockets

Related topics