We are running into an issue with managing available memory on our compute nodes. In our qmgr config, we set
resources_available.mem to a lower value (by some fixed amount) than the total physical memory on the machine, in order to prevent the OS from being starved. For example, if a node has 64gb physical memory, we might do something like:
set node node0001 resources_available.mem = 54gb
The problem is, whenever the PBS mom process restarts – either due to a reboot or just a service restart – there is a mom hook that resets the available memory to the total physical memory. From the server logs:
06/18/2020 09:44:53;0100;Server@bright01;Node;node0001.thunder.ccast;Updated vnode node0001's resource resources_available.mem=65336320kb per mom hook request
Can we somehow change this behavior to prevent our memory values from being overwritten?