Yes. I’m not familiar with your cgroup hook, however. The fact you ALSO have a hard limit is bizarre if you have enabled soft limits – that too seems to a be a bug in the hook:
else:
# For all the rest just pass hostresc[resc] down to set_limit
self.set_limit(resc, hostresc[resc], jobid)
Obviously that call should not be made for hostresc[‘mem’] if self.cfg[‘cgroup’][‘memory’][‘soft_limit’] is set.
I pointed you at the code – feel free to copy the hook and add log messages to figure out what is happening.
BTW, at DEBUG4 level there is also this to be read in the logs:
pbs.logmsg(pbs.EVENT_DEBUG4,
"Limits computed from requests/defaults: "
“mem: %s vmem: %s” % (mem_limit, vmem_limit))
And of course if this is wrong feel free to change the hook.
In the hook I’m currently using, even though I still have the bug that the hard limit is being set even when requesting soft limits, the memsw limit is set correctly:
[alexis@dragon cgroups]$ qsub -I -lselect=1:mem=100m:vmem=2gb
qsub: waiting for job 9194.dragon to start
qsub: job 9194.dragon ready
[alexis@dragon ~]$ cd /sys/fs/cgroup/memory/pbs_jobs.service//jobid/9194.dragon/
[alexis@dragon 9194.dragon]$ head memory.*limit*
==> memory.kmem.limit_in_bytes <==
9223372036854771712
==> memory.kmem.tcp.limit_in_bytes <==
9223372036854771712
==> memory.limit_in_bytes <==
104857600
==> memory.memsw.limit_in_bytes <==
2147483648
==> memory.soft_limit_in_bytes <==
104857600
I fixed the bug by inserting this snippet in front of the ‘catchall, just pass this on’ limit setting to avoid a hard limit being set too low when enabling the soft limit:
elif (resc == 'mem'
and self.cfg['cgroup']['memory']['soft_limit']):
# Don't set hard mem limit to mem requested if soft mem
# limits were enabled;
# you do need to set a hard limit to the vmem limit
# if applicable or setting the vmem limit will fail
if 'vmem' in hostresc:
self.set_limit(resc, hostresc['vmem'], jobid)
else:
# For all the rest just pass hostresc[resc] down to set_limit
self.set_limit(resc, hostresc[resc], jobid)