Is it possible with PBSpro v. 14.1.0 to set the # of PCPUs for an execution host to be less that the number of actual CPUs on that host? For example I have a host with 128 CPUs, but I only want 120 to be under the control of the scheduler.
-SS-
Is it possible with PBSpro v. 14.1.0 to set the # of PCPUs for an execution host to be less that the number of actual CPUs on that host? For example I have a host with 128 CPUs, but I only want 120 to be under the control of the scheduler.
-SS-
Hi SS,
Scheduler does not look at pcpus to schedule a job rather it schedules on the basis of ncpus set on the node. Now you can modify ncpus of a node/vnode using qmgr command.
So if I understand your problem correctly, I think setting the right value in ncpus would just work for you.
To set ncpus on the node please execute the following as a privileged user (manager) -
qmgr -c “s n node_name resources_available.ncpus=120”
Hope it Helps!
Regards,
Arun Grover
Arun,
That is exactly what I was looking for. Thank you.
-SS-
As an extra note, it may be more robust to set resources_available.ncpus in a v2 config file on the MoM node.
That way if you delete the node and then recreate it the server will pick up the resource from MoM; otherwise it’s all too easy to delete and recreate a node and forget that you had overrided some MoM-sent value using qmgr.
Will the scheduler still be aware of the setting?
I am asking because I use version 2 MOM config files to set sharing to job exclusive (sharing = force_exclhost), and the scheduler does not really seem to take this into account when it computes start time and backfilling. If I also add (to qsub) -lplace=...:exclhost
all is well for our use case.
Really good point! Thanks.
/Bjarne
“Will the scheduler still be aware of the setting?”
Yes. A lot of sites are using exactly this (and in fact you used to have to do it to work around some bugs in the past).
I’m puzzled as to why the “sharing” attribute doesn’t seem to be working as it should – I’ve used it in the past extensively. Mind you, I usually like to do what you have done and use a queuejob hook to make jobs ask what they want – it’s a lot less confusing than overriding what the job has asked or not depending on what nodes they end up on using the “sharing” attribute.
It’s either been broken recently, or is only broken in simulations of the future for backfilling (which would be strange, because simulations essentially use the same code).
Do you actually see the sharing attribute as set to the correct value in pbsnodes -av?
It’s easy to use v2 config files with the incorrect syntax (sharing is an attribute and not a resource) or to forget that v2 config files need to start with “$configversion 2” to work properly.
In that case, what you think the scheduler sees is not what it actually sees.
I have tried to go over my previous test cases, and I cannot reproduce it. I think that the actual backfilling was not really working at the time. Something which has now been resolved in other threads (setting strict_ordering
etc).
Qmgr -c 'set server est_start_time_freq ...' fails - #12 by arungrover
My bad for reporting it. Probably all is well.
Yes. That seems fine.
True. The v2 config is (for host dn101):
Once again, thanks.
/Bjarne
PS: On a side note, I would still like to have a way to make a single v2 config file be applicable for multiple/all hosts MOM sharing config