Memory restriction on all nodes

Hi ,
I want to set memory restriction on all nodes user jobs could not use more than 300GB if the system memory goes beyond 300Gb job automatically got killed otherwise it gives notification to user mail-id it is possible instead of cgroups.

Thanks in Advance
Kunfu

with cgroups and with $enforce mem ($PBS_HOME/mom_priv/config ) you can restrict the job(s) to use within the memory limitation of what has been requested in the qsub statement.

It seems your use case is to delete the jobs running on the compute node(s), if in case total memory consuption is more than 300GB (threshold in total) . For this you would need to write a mom_periodic hook , that constantly monitors the memory consumption on the nodes and kill the jobs when such a threshold is reached.

Hi adarsh ,
Thanks for responding …
I would like to go with cgroup but if we not requested mem in qsub statement is thier any option without requesting mem in qsub and job get kill if the node goes out of memory.
Can we set OOM killer protection in pbs ?Can we have how to set ?

Best regards
Kunfu

If the memory is not requested via qsub, then there will be default memory request set to the job by the server attribute or by the cgroup hook. If you are using cgroup hook, then the job would run within the boundaries of what has been requested for the job.

Refer: What would happen to a job if its memory usage exceeds its requested size? - #3 by sxy

https://www.nas.nasa.gov/hecc/support/kb/checking-if-a-pbs-job-was-killed-by-the-oom-killer_221.html

Hi Adarsh ,
Thanks for your information
Just one thing clicked in my mind Is cgroup hook needed system cgroup configuration becasue thier is another option we can set cgroup as in system level to restrict resources, it would be better if cgroup hook handle system memory without touch system cgroup configuration.

Best Regards
kunfu

Hi @kunfu , the cgroup hook depends on the cgroups.json file for customisation. To make any changes this has to be updated to your requirements with respect to memory and imported into the system.
ref: https://openpbs.atlassian.net/wiki/spaces/PD/pages/11599882/PP-325+Support+Cgroups

Yes, please refer this document https://www.altair.com/pdfs/pbsworks/PBSAdminGuide2021.1.pdf
and section 16.2 Introduction to Cgroups