Hi Scott,
Thanks (apparently I was typing when your reply came in, so I did not see it). We agree on the solution.
No. Some of the jobs will have it set and some not. The hook should not try to change the set. In certain cases (when the high-priority nodes are “reserved”), the hook will add a low-priority nodetype=io, but it will not change it. However, I need to make another hook, which can remove the added nodetype=io, just in case the job has not yet run, when the reservation is lifted. (In certain cases, the job will not be able to run on the io-nodes, but will just remain queued - which in our case is ok for a while).
Hmmm. That is not what I see.
Here is my example case (with comments added along the way).
[bjb@bifrost1 ~]$ qsub -l walltime=00:02:00 -l select=2:ncpus=2 -l place=scatter:excl -- /bin/sleep 30
1386.bifrost1
[bjb@bifrost1 ~]$ qstat2lin 1386|grep -e comment -e select
Resource_List.select = 2:ncpus=2
schedselect = 2:ncpus=2
Submit_arguments = -l walltime=00:02:00 -l select=2:ncpus=2 -l place=scatter:excl -- /bin/sleep 30
qstat2lin
is just my hack for qstat -f
without the line breaks, see
Qstat -f <JOBID> line breaks - #2 by buchmann
DCOO / ClusterTools / [faa9dc] /pbs/qstat2lin
The following will let the scheduler try to run the job. The hook will catch it, and update select
- forcing it onto the io-nodes:
[bjb@bifrost1 ~]$ qmgr -c 'set server scheduling = 1'
Note that both Resource_List.select
and schedselect
get updated:
[bjb@bifrost1 ~]$ qstat2lin 1386|grep -e comment -e select
Resource_List.select = 2:ncpus=2:nodetype=io
schedselect = 2:ncpus=2:nodetype=io
comment = Not Running: PBS Error: JOBSELECT2io - nodetype=io explicitly added to job select, as the walltime overlaps the reserve time set by the adminstrator
Submit_arguments = -l walltime=00:02:00 -l select=2:ncpus=2 -l place=scatter:excl -- /bin/sleep 30
Luckily Submit_arguments
is not updated, so in principle, I can use that (match for nodetype=io) to see if the hooks have fiddled with the nodetype. Alternativelty, I might try to store something in job.Variable_List
. (I cannot rely on the comment
field, as the server/scheduler may overwrite what I have put in.
Forcing yet another scheduling iteration, and the scheduler realizes that I really have only defined one io-node, so the job cannot really run:
[bjb@bifrost1 ~]$ qmgr -c 'set server scheduling = 1'
[bjb@bifrost1 ~]$ /home/bjb/DcooWare/dcoo-clustertools-ssh2016-09-27/pbs/qstat2lin 1386|grep -e comment -e select
Resource_List.select = 2:ncpus=2:nodetype=io
schedselect = 2:ncpus=2:nodetype=io
comment = Can Never Run: can't fit in the largest placement set, and can't span psets
Submit_arguments = -l walltime=00:02:00 -l select=2:ncpus=2 -l place=scatter:excl -- /bin/sleep 30
My nice comment got overwritten, but really the behaviour of the system is exactly what I expect.
I have tried also using qrun
and it does not seem to make a difference: nodetype still gets correctly set. So, with my usage I can’t confirm “that the job must already have a nodetype request in the select statemt before the runjob hook tries to set it”.
Thank you so much for answering!
/Bjarne