I’m using version 19.1.1, have a faster switch and a slower switch, and am trying to follow the documentation:
(Note that copying and pasting from either document does not work, because the author’s text editors changed the quotes from " to the fancy ones that aren’t special to a shell. So I used single-quotes below.)
My goal is to tell certain (MPI) jobs to run on any of the nodes connected to the fast switch, and for other jobs that don’t request anything in particular to run on any of the nodes on the slow switch first (until those computes are full, then run on the fast-switch nodes).
I did the following:
# echo "switch type=string" >> /var/spool/pbs/server_priv/resourcedef # systemctl restart pbs # qmgr -c 'set server node_group_enable=true' # qmgr -c 'set server node_group_key=switch' # qmgr -c 'set node c00 resources_available.switch=1gBE' ... # qmgr -c 'set node c20 resources_available.switch=10gBE' ...
I can see the correct lines in the output from pbsnodes -a:
resources_available.switch = 1gBE
But now the documentation (both versions) becomes a little less clear to me, for how to use qsub to run a job. Everything I try gives an error, either node(s) specification error or
# qsub -l nodes=1:switch=10gBE qsub.math.long.csh qsub: node(s) specification error # qsub -l place=switch=10gBE qsub.math.long.csh qsub: Illegal attribute or resource value Resource_List.place # qsub -l place=group=10gBE qsub.math.long.csh qsub: Unknown resource Resource_List.place # qsub -l select=1 -l place=group=10gBE qsub.math.long.csh qsub: Unknown resource Resource_List.place [~/test]# qsub -l select=1 -l place=group=foo qsub.math.long.csh qsub: Unknown resource Resource_List.place [~/test]# qsub -l select=1 -l place=group=switch qsub.math.long.csh 53180.*hostname*
Finally, a job submitted! But I didn’t get to request the job run on a node on the fast switch, and it in fact ran on the slow switch.
Am I going about this completely wrong, and should not be using placements sets to request nodes on the fast switch for certain jobs?
Any hints would be greatly appreciated.