How can I limit the amount of memory used by a job

Below is a typical node. I have users requesting huge amounts of memory, which is causing system services like sssd to die.

[root@bonobo ~]# pbsnodes compute-0-3
compute-0-3
Mom = compute-0-3.local
Port = 15002
pbs_version = 14.1.2
ntype = PBS
state = free
pcpus = 40
resources_available.arch = linux
resources_available.host = compute-0-3
resources_available.mem = 131744472kb
resources_available.ncpus = 40
resources_available.vnode = compute-0-3
resources_assigned.accelerator_memory = 0kb
resources_assigned.mem = 0kb
resources_assigned.naccelerators = 0
resources_assigned.ncpus = 0
resources_assigned.netwins = 0
resources_assigned.vmem = 0kb
resv_enable = True
sharing = default_shared

  1. You can set the memory available on that node to 100GB via qmgr or config_v2 file
    qmgr -c “set node compute-0-3 resources_available.mem=100000000kb”

  2. you can use cgroups and there is feature called reserve memory, you can use that.

There is already a value set:

resources_available.mem = 131744472kb

Is PBS supposed to kill the job if the user asks for more?

Yes, If you set this $PBS_HOME/mom_priv/config parameter on the compute node (requires restart of pbs_mom services):
$enforce mem

Snapshot from the PBS Professional 2021.1 Reference Guide: