Node grouping - config problems

buchmann · August 12, 2016, 8:02am

Hi Broam,

Thank you for the detailed answer, which I have tried out.

Thanks. This works as described.
For reference (other users reading), the behaviour is described in AdminGuide (v13.1) §4.8.33.4.ii.
Upon submission, the sched-log spits out (example):

…;pbs_sched;Job;.;Considering job to run
…;pbs_sched;Job;.;Can’t fit in the largest placement set, and can’t span placement sets
…;pbs_sched;Job;.;Job will never run with the resources currently configured in the complex

The job status gets a comment (seen by eg qstat) reading:

comment = Can Never Run: can’t fit in the largest placement set, and can’t span psets

The job is still stuck in the queue, though. I will try to look for a setting, where it may be possible to automagically kick such jobs totally from the queue. (This is a lower-priority issue, but as none of our real jobs are started interactively, I would like an automatic failure to happen).

Actually, this happens to be perfectly OK in our case. Firstly nodes are not identical (compute are 20-core haswell to be used for eg HPC/MPI stuff, while the io are older 12-core nehalem systems). Secondly, we do not have any lusers on the system. All jobs are under control of a small hand-full of developers setting up operational job systems. Some jobs really must run on the compute nodes, while others could run on any group. I could probably set the selection up based on two different queues, but both kind of jobs will be issued by the same operational user.

This is a cute trick, and I will keep it in mind - plus write it on our internal howto/tricks list for PBSpro. For the foreseeable future, the group which I want to be the default happens to be the smallest.
As we are aiming to power-down nodes, which are not in use, I expect that we will have to explicilty tell pbs (server) that those fake nodes are truly “dead” and not to be woken (plus not to complain about them not waking). I’ll deal with that if the issue arises. +1 for raising this caveat.

I agree that the select syntax is more powerful. However, we have a few “specialities” to consider.
A. We will enforce job-exclusive use of nodes, so we will never have several jobs (or users) on a single node.
B. Some jobs (in particular IO-intensive jobs) just want exclusive access to “a node” (or a node count) to do “their thing”. These jobs interfere with other jobs as they use loads of system resources, so I still want a way to state “give me two nodes” - and not just get scheduled to a single vnode because the resources seem to fit.

To resolve this I have now created a new resource (“infiniband” - type long), and given each node one of these. Then I can ask for -lselect=2:infiniband and get exactly 2 nodes. The nodect limit (http://community.openpbs.org/t/problems-with-resources-max-nodes) then still is enforced and works.

Thanks a lot guys!

/Bjarne

Topic		Replies	Views
How to scatter jobs over vnodes? Users/Site Administrators	30	8153	May 19, 2020
Filtering nodes per the job request Developers	11	2632	June 29, 2018
Job not getting distributed among nodes Users/Site Administrators	41	3111	June 19, 2022
Placement set questions Users/Site Administrators	11	1210	August 9, 2021
Set group priority on node group Users/Site Administrators	3	2709	September 8, 2018

Node grouping - config problems

Related topics