PBS cgroups and Numa Nodes

SpencerLeith · February 11, 2022, 10:06am

Hi All,

I was hoping to get some advice on the open PBS logic when it submits a job?

The scenario we have is a job submitted with an allocated amount of memory (e.g. 50gb)
What then happens is the job is submitted to a cpuset (numa node) that has less than 50gb available for that cpuset. The overall available memory on the server is more the 50gb for info.

The job is then killed by OOM killer with the constraint CONSTRAINT_CPUSET.

My question is does PBS check if the cpuset that a job is assigned to has enough available memory? If not what can we do to correct the logic?

Best regards
Spencer

SpencerLeith · April 5, 2022, 9:33am

Is anyone able to help?

adarsh · April 5, 2022, 2:11pm

Did you get a chance to go through: https://openpbs.atlassian.net/wiki/spaces/PD/pages/11599882/PP-325+Support+Cgroups

Please share the PBS Version, cgroup configuration file with the community

If memory subsystem enabled in the cgroup ?
what does the mom log lines before kililng this job ?

Topic		Replies	Views
PBS cgroups and Numa Nodes issue Users/Site Administrators	1	593	May 31, 2022
Memory restriction on all nodes Users/Site Administrators	5	1007	September 22, 2021
PBS - memory ressource (pbs_cgroup) Users/Site Administrators	3	1934	July 14, 2022
How can I limit the amount of memory used by a job Users/Site Administrators	3	616	April 11, 2021
The memory used by multiple nodes is not displayed Users/Site Administrators	12	60	August 13, 2025

PBS cgroups and Numa Nodes

Related topics