I cant seem to enforce a maximum restriction/limit on the number of nodes allowed for a single job.
If I am not mistaken, this could be done either on queue or server level, by configuring (in qmgr):
sudo qmgr -c "set queue prod resources_max.nodes = 3"
and/or
sudo qmgr -c “set server resources_max.nodes = 3”
However in any case (even with both enforced), the system happily starts jobs with 4 or more nodes as per:
qsub -I -l nodes=4
I have tried to strip down the config to a bare minimum, but the problem persists.
Do any of you know if this feature is working?
From the logs, I can gather that the job requests and gets ncpus=1 on each of 4 nodes:
LOG: Job Run at request of Scheduler@bifrost1.fcoo.dk on exec_vnode (dn008:ncpus=1)+(dn101:ncpus=1)+(dn102:ncpus=1)+(dn103:ncpus=1)
And it is “counted” as just resources_used.ncpus=4 at the end:
LOG: Exit_status=0 resources_used.cpupercent=0 resources_used.cput=00:00:00 resources_used.mem=0kb resources_used.ncpus=4 resources_used.vmem=0kb resources_used.walltime=00:00:02
Each vnode has ncpus=20, but even specifying “4 vnodes with 20 cpus each” is allowed to run:
qsub -I -l select=4:ncpus=20
I guess I am doing something wrong, although I cannot figure it out. For sure it feels like the resources_max.nodes is somehow broken. If so, is there an alternative?
Thanks,
/Bjarne