Hi Broam,
Thank you for the detailed answer, which I have tried out.
Thanks. This works as described.
For reference (other users reading), the behaviour is described in AdminGuide (v13.1) §4.8.33.4.ii.
Upon submission, the sched-log spits out (example):
…;pbs_sched;Job;.;Considering job to run
…;pbs_sched;Job;.;Can’t fit in the largest placement set, and can’t span placement sets
…;pbs_sched;Job;.;Job will never run with the resources currently configured in the complex
The job status gets a comment (seen by eg qstat) reading:
comment = Can Never Run: can’t fit in the largest placement set, and can’t span psets
The job is still stuck in the queue, though. I will try to look for a setting, where it may be possible to automagically kick such jobs totally from the queue. (This is a lower-priority issue, but as none of our real jobs are started interactively, I would like an automatic failure to happen).
Actually, this happens to be perfectly OK in our case. Firstly nodes are not identical (compute are 20-core haswell to be used for eg HPC/MPI stuff, while the io are older 12-core nehalem systems). Secondly, we do not have any lusers on the system. All jobs are under control of a small hand-full of developers setting up operational job systems. Some jobs really must run on the compute nodes, while others could run on any group. I could probably set the selection up based on two different queues, but both kind of jobs will be issued by the same operational user.
This is a cute trick, and I will keep it in mind - plus write it on our internal howto/tricks list for PBSpro. For the foreseeable future, the group which I want to be the default happens to be the smallest.
As we are aiming to power-down nodes, which are not in use, I expect that we will have to explicilty tell pbs (server) that those fake nodes are truly “dead” and not to be woken (plus not to complain about them not waking). I’ll deal with that if the issue arises. +1 for raising this caveat.
I agree that the select syntax is more powerful. However, we have a few “specialities” to consider.
A. We will enforce job-exclusive use of nodes, so we will never have several jobs (or users) on a single node.
B. Some jobs (in particular IO-intensive jobs) just want exclusive access to “a node” (or a node count) to do “their thing”. These jobs interfere with other jobs as they use loads of system resources, so I still want a way to state “give me two nodes” - and not just get scheduled to a single vnode because the resources seem to fit.
To resolve this I have now created a new resource (“infiniband” - type long), and given each node one of these. Then I can ask for -lselect=2:infiniband and get exactly 2 nodes. The nodect limit (http://community.openpbs.org/t/problems-with-resources-max-nodes) then still is enforced and works.
Thanks a lot guys!
/Bjarne