Auto vnodes on NUMA system?

aflyhorse · April 21, 2017, 11:16am

Recently I got a NUMA system (32 sockets but not an SGI UV). I noticed my MPI job which is smaller than a NUMA size is broken into different nodes, although there are still empty nodes. The performance is suffering. My question is:

lscpu and numastat both reported 16 nodes, is it considered by the scheduler?
the PBSPro BigBook referred to Cpusets. Is it the same as cpuset (/dev/cpuset)?
Should I establish 16 vnodes? if yes, what should I do?
(Running CentOS 7.3, from the rebuilded 14.1.0 srpm without modification)

bhroam · April 21, 2017, 7:20pm

The scheduler places jobs based on the PBS vnodes of the system. From your post it sounds like you only have one vnode. The scheduler will look at the entire system as one large pool of resources instead of smaller chunks of resources.
While SGI systems provide a topology file we create vnodes from, you don’t have one. You’ll have to create vnodes yourself. Create yourself a vnode def file with one vnode per NUMA node.

You can go further if you want and create placement sets. This will allow the scheduler to place jobs closer together.

I hope this helps,
Bhroam

Topic		Replies	Views
CGroups hook: vnode_per_numa_node when NUMA bisects sockets Users/Site Administrators	2	1010	December 17, 2019
Vnode_per_numa_node and custom resource queue_list Users/Site Administrators	2	470	May 2, 2023
PBS cgroups and Numa Nodes issue Users/Site Administrators	1	586	May 31, 2022
PBS cgroups and Numa Nodes Users/Site Administrators	2	872	April 5, 2022
PBS on a single node / CPU? Users/Site Administrators	3	2207	April 20, 2020

Auto vnodes on NUMA system?

Related topics