PBS mom hook resetting memory values

ndusek · June 18, 2020, 2:56pm

Hello,

We are running into an issue with managing available memory on our compute nodes. In our qmgr config, we set resources_available.mem to a lower value (by some fixed amount) than the total physical memory on the machine, in order to prevent the OS from being starved. For example, if a node has 64gb physical memory, we might do something like:

set node node0001 resources_available.mem = 54gb

The problem is, whenever the PBS mom process restarts – either due to a reboot or just a service restart – there is a mom hook that resets the available memory to the total physical memory. From the server logs:

06/18/2020 09:44:53;0100;Server@bright01;Node;node0001.thunder.ccast;Updated vnode node0001's resource resources_available.mem=65336320kb per mom hook request

Can we somehow change this behavior to prevent our memory values from being overwritten?

Thank you,
Nick

vstumpf · June 18, 2020, 10:13pm

I don’t think there’s a way to prevent hooks from changing the resources on the vnode.

Though I can think of some other fixes:

edit the hook to not change memory
create a new hook that runs after the first hook to set the memory to another value

If anyone else has any ideas, please comment

ndusek · June 19, 2020, 4:35pm

Thank you for the tips. How would I go about determining which hook is changing the resources, and then how do I change it? We’ve created and deployed custom hooks, so I know the workflow there. But this seems to be caused by a hook that’s part of the default PBS installation. Can you point me to somewhere in the documentation that describes how to do what you’re suggesting?

Nick

vstumpf · June 19, 2020, 4:51pm

The MoM logs should log the hooks being run at a higher log level, I’m not sure the exact level on the top of my head, but for debugging 0xffff always works. (8.10 in the Hooks guide, Error Reporting and Logging)

Do you have the cgroups hook enabled? If it’s set to create vnodes, it will use all the available memory on the machine. You can edit this with the cgroup’s config file with the keys ‘reserve_amount’ and ‘reserve_percent’. (15.4 in the Admin guide, Configuring Cgroups)

Topic		Replies	Views
Custom mom hook for Memory reporting Developers	0	1124	February 1, 2017
How to add virtual memory for each node on PBS Developers	15	4729	May 4, 2017
How can I limit the amount of memory used by a job Users/Site Administrators	3	593	April 11, 2021
VMEM VS MEM - what is difference Users/Site Administrators	9	6511	December 4, 2018
Hook;pbs_python;Server and MoM vnode names may not be consistent Users/Site Administrators	0	547	October 6, 2020

PBS mom hook resetting memory values

Related topics