CGroups hook: vnode_per_numa_node when NUMA bisects sockets

After backporting cgroups hook from 19.1.2 to 18.1.4 and having great success using it to manage allocation and usage on our nodes, I’ve run into a node with an unusual division of numa nodes that the cgroups hook is having difficulty with.

The node has 32 cores (16 per socket), 4 GPUs, & 64 gb mem. However when enabling vnode_per_numa_node in the cgroups hook config, the node is divided into 2 vnodes with 8 cores, 2 GPUs, & 32 gb mem. As a result an entire socket is just gone - as far as pbs is concerned. This seems to be due to the layout of this node, which I have provided a highly technical diagram of below:

A less technical diagram from $ nvidia-smi topo -m

Has anyone has encountered a node like this? For the sake of using the GPUs efficiently I’m considering leaving these as 8 core vnodes since the hook is managing the reported resources very well. However, I really don’t want to have an unallocatable socket. I’d be grateful for any suggestions.

I can also provide the backported python script for anyone who would like it. I’m not sure what the best way to share it is. The backporting just involved commenting out a handful of lines.



Kudos on the highly technical diagram! Just one question… what value are you using for use_hyperthreads in the hook configuration? It is false by default.

"use_hyperthreads"      : false,
"ncpus_are_cores"       : true,

We do not have hyperthreading enabled on any nodes.

Edit: Actually I am wrong… The node config for this node is enabling hyperthreading which should not have happened… Thank you for pointing this out. I think that solves the issue.