Hi
I would like to request advice on below issue please:
The scenario we have is a job submitted with an allocated amount of memory (e.g. 50gb)
What then happens is the job is submitted to a cpuset (numa node) that has less than 50gb available for that cpuset. The overall available memory on the server is more the 50gb for info.
The job is then killed by OOM killer with the constraint CONSTRAINT_CPUSET
.
My question is does PBS check if the cpuset that a job is assigned to has enough available memory? If not what can we do to correct the logic?
We are using pbs_version = 19.1.3
Note -
FYI,
- I checked our configuration & compare it with below doc and it is same
PP-325: Support Cgroups - Project Documentation - Confluence (atlassian.net) - Jobs killing reason - CONSTRAINT_CPUSET
¬Regards,
Ritika.