Node property equivilent

Using a cloud system and I have nodes in multiple regions. I can submit to any of them and run across all of the regions without problems. We do have some high i/o jobs that we want to limit to a particular region due to disk access performance.
My thought was to add a location attribute to each node showing the region. Then we can specify the location in the qsub command. Otherwise it’d use any server available.
From what I can tell Torque could do this using node properties. I have been trying and I can’t find the openpbs equivalent.

Any advice?

Please try this,

–create a custome string_array host level resource
qmgr -c “create resource region type=string_array,flag=h”

–Add it to the resources: line in the $PBS_HOME/sched_priv/sched_config
resources: “ncpus, aoe,…,region”

–kill -HUP or restart the pbs services on the server

–qmgr -c “set queue cloudeuropeQ default_chunk.region=europe”

–for i in nodes_in_europe;do qmgr -c “set node $i resources_available.region=europe”

–To submit a job
qsub -l select=1:ncpus=1 -q cloudeuropeQ – /bin/sleep 1000

Thanks!
If I leave out setting the queue default_chunk will the scheduler ignore that resource?

I want the queue to assign jobs across all regions unless I specifically request one.

Yes, it will be just like another queue without having a default chunk mapped to a compute node of a specific region. Then your job submission line should be as below

qsub -l select=1:ncpus=1:region=europe -q cloudeuropeQ – /bin/sleep 1000
or
qsub -l select=1:ncpus=1:region=europe – /bin/sleep 1000

1 Like

Thanks. I’ll give this a try.

Another option is to use placement sets. There are several uses of placement sets, but your use case was the original one (network topology).

It works kind of like how @adarsh said, but instead of having to request a particular region, the scheduler does it for you. The idea is that you don’t care which region you run in, but you want all the nodes assigned to your job in the same region.

To do this, set up the string array resource like he suggested.
Then set the following:
qmgr -c ‘s s node_group_key=region’
qmgr -c ‘s s node_group_enable=true’

Placement sets are sorted by size (ncpus/men). You will see your jobs fill up one placement set before it moves onto the next.

Placement sets can overlap. This means if you have smaller topology inside inside of a larger region, you can create all the smaller placement sets to overlap with the larger region sets.

Something like this:
P1: N1, N2
P2: N3, N4
R1: N1, N2, N3, N4
P3: N5, N6
P4: N7, N8
R2: N5, N6, N7, N8

The scheduler will run jobs on the smaller PN placement sets first before trying to run them on the larger ones. As a note, this will require setting up two different placement set resources (like region). One will be fore the P placement sets, and the other will be for the R placement sets.

One thing to note is that there is a case where you can cross regions. This happens when a job cannot run in any defined placement set. Think about if a job requested 5 nodes in the above example. The largest set is 4 nodes, so in that case the scheduler will run the job wherever it can.

There is nothing stopping you from combining what @ardash said and placement sets. You can set up placement sets for most people to use, but if someone really wants to run in europe, they can request region=europe like ardash said.

Bhroam

2 Likes

Very interesting. I like this approach. Will definitely make managing resources easier to group the jobs.
For the most part our jobs can run in any region but we are using storage located in region 1. There is a slight delay in access for region 2 and 3 but most of our jobs are low io / high cpu so the delay is acceptable.
We have a set of very high i/o jobs that we really need to confine to region 1, though.

This is working great. Thanks!
Our users can specify region 1, 2, 3 or all in our launch menu and the queue is behaving accordingly.

1 Like