Placement sets for fast vs. slow switches problems

I’m using version 19.1.1, have a faster switch and a slower switch, and am trying to follow the documentation:


(Note that copying and pasting from either document does not work, because the author’s text editors changed the quotes from " to the fancy ones that aren’t special to a shell. So I used single-quotes below.)

My goal is to tell certain (MPI) jobs to run on any of the nodes connected to the fast switch, and for other jobs that don’t request anything in particular to run on any of the nodes on the slow switch first (until those computes are full, then run on the fast-switch nodes).

I did the following:

# echo "switch type=string" >> /var/spool/pbs/server_priv/resourcedef 
# systemctl restart pbs
# qmgr -c 'set server node_group_enable=true'
# qmgr -c 'set server node_group_key=switch'
# qmgr -c 'set node c00 resources_available.switch=1gBE'
...
# qmgr -c 'set node c20 resources_available.switch=10gBE'
...

I can see the correct lines in the output from pbsnodes -a:

 resources_available.switch = 1gBE

But now the documentation (both versions) becomes a little less clear to me, for how to use qsub to run a job. Everything I try gives an error, either node(s) specification error or

# qsub -l nodes=1:switch=10gBE qsub.math.long.csh
qsub: node(s) specification error
# qsub -l place=switch=10gBE qsub.math.long.csh
qsub: Illegal attribute or resource value Resource_List.place
# qsub -l place=group=10gBE qsub.math.long.csh
qsub: Unknown resource Resource_List.place
# qsub -l select=1 -l place=group=10gBE qsub.math.long.csh
qsub: Unknown resource Resource_List.place
[~/test]# qsub -l select=1 -l place=group=foo qsub.math.long.csh
qsub: Unknown resource Resource_List.place
[~/test]# qsub -l select=1 -l place=group=switch qsub.math.long.csh
53180.*hostname*

Finally, a job submitted! But I didn’t get to request the job run on a node on the fast switch, and it in fact ran on the slow switch.

Am I going about this completely wrong, and should not be using placements sets to request nodes on the fast switch for certain jobs?

Any hints would be greatly appreciated.

And I didn’t know ahead of time that starting a line with “#” would make it large! I went back and fixed the formatting.

Hey,
I don’t think placement sets are what you want. The use case for placement sets are when you want to run all on the fast switch or the slow switch, but you don’t care which.

What you want to do is to create a custom resource like you have for the switch and set the fast switch to fast. You can set the rest to slow or not, it won’t matter. You then set the priority attribute such that slow nodes have a higher priority than fast nodes.
The jobs which require fast nodes request -lselect=N:switch=fast:ncpus=…
The jobs that don’t require the fast nodes, just submit normally. Since the slow nodes are sorted to the front of the list, they will be used first before the fast nodes.

Bhroam

Hi Bhroam, thanks for the quick reply.

OK, I understand what Placement Sets are used for now, and agree that’s not what I want. I deleted my changes to /var/spool/pbs/server_priv/resourcedef, and edited /var/spool/pbs/sched_priv/sched_config so the resources definitions like looks like this:

resources: "ncpus, mem, arch, host, vnode, netwins, aoe, brand, switch"

Then I ran

# systemctl restart pbs
# qmgr -c 'create resource switch,brand type=string,flag=h'

So now I have non-consumable, hardware resources:

# qmgr -c 'p s' | head -15
#
# Create resources and set their properties.
#
#
# Create and define resource switch
#
create resource switch
set resource switch type = string
set resource switch flag = h
#
# Create and define resource brand
#
create resource brand
set resource brand type = string
set resource brand flag = h

(I define brand because I have a mix of AMD and Intel CPUs).

Now I need to set the value of the resource for each compute node. Following the example at 5.14.4.2.i Example of Configuring Static Host-level Resource in PBSAdminGuide19.2.3.pdf does not work:

# qmgr -c 'set node c21 switch=10gBE'
qmgr obj=c21 svr=default: Undefined attribute
qmgr: Error (15002) returned from server

I see that node c21 exists in the output of pbsnodes -a. What am I missing?

Hey @bartbrashers
The problem is that qmgr deals with attributes first and resources second. This means you need to set the resources on an attribute. The attribute you set on nodes is resources_available.

qmgr -c ‘set node c21 resources_available.switch=10gBE’