Maybe this is an easy question. Maybe I already have the answer and I just don’t like it, I am not sure.
What I want to do is say "This queue has access to this set of nodes and only this set of nodes"
The actual example in this case is I created a boolean resource called milan and set it true on nodes that have AMD Milan processors. Now I want to create a queue that has those node, and only those nodes, available to it. If you ask for something those nodes don’t have, the job should not run. If you just do:
-q milan -l select=4
it should give you 4 of the Milan nodes.
From the 2021.1.2 big book, section AG 2.3.11 says:
For each queue, you can define the resources you want to have available at that queue. To set the value for an existing resource, use the qmgr command:
Qmgr: set queue <queue name> resources_available.<resource name> = <value>
Which sounds like exactly what I want, but apparently it isn’t since I did that and the job ran on some random node that didn’t have a Milan processor in it. We finally got this to work by setting:
set queue milan default_chunk.milan = True
But that has the word default in it which makes me think you could override it. I think that solves my “If you just say -l select=4” requirement, but I am not sure it covers “Those nodes and only those nodes”. The reason I want this hard partition is that queue may have a different policy on it and I don’t want a clever user to put a job there to get advantageous priority, but then run on other resources.
I recommend the AG section Procedure to Associate Vnodes with Queues. In the copy of the big book that I currently have (2021.1.1), it’s section 4.9.2.0.i. This section shows how to configure a queue to set a custom resource (like your milan resource) on jobs submitted to it and how to configure vnodes to only accept jobs that have that custom resource.
Thanks for the pointer. Overall, I really like the PBS documentation, but sometimes finding the right search terms to find what you are looking for can be a bit of a challenge, at least for me.
It looks like we were pretty close. In section 4.9.2.2 step 5 says “Force jobs to request the correct Qlist values” and sets the default_chunk values. Having the word default in the name doesn’t sound very “forceful” to me, but oh well. I will set it up that way and see if I can get a node outside the ones I want.
At our site we use the Qlist resource and associate a Qlist with each node. So for instance we have our v100 nodes with a Qlist setting of v100. Additionally we use a routing queue structure where we have users select which gpu they would like to use ( since we have multiple types in the cluster) and this then helps qmgr put the job into the right queue targeting those specific resources.
create queue gpgpu
set queue gpgpu queue_type = Execution
set queue gpgpu from_route_only = False
set queue gpgpu resources_max.gpu_type = v100
set queue gpgpu resources_max.ngpus = 32
set queue gpgpu resources_min.gpu_type = v100
set queue gpgpu resources_min.ngpus = 1
set queue gpgpu default_chunk.Qlist = v100
set queue gpgpu max_run_res.ngpus = [u:PBS_GENERIC=32]
set queue gpgpu backfill_depth = 50
set queue gpgpu enabled = True
set queue gpgpu started = True
So gpu_type is used to associate the job against the queue in the routing structure while Qlist ties the queue to nodes assigned to the queue.
@weallcock you are correct about the ‘default’ nature of default_chunk. If a job requests the resource in the select statement, it will override the default. If you have a 1:1:1 correspondence from jobs to queues to nodes, then the best way to do it is to assign the ‘queue’ attribute on the node to the queue name. This means only jobs from that queue can run on those nodes, and the jobs in that queue can only run on those nodes. It makes a hard partition of nodes.
If you have a significant volume of jobs going through this queue, you might consider multi-sched. You make a hard partition again (in this case you use the partition attribute), but you also start up a scheduler to only handle that queue. Since you have multiple schedulers handling the different queues, the workload on each one will be lighter.
For section references, I am using the 2021.1.2 BigBook
I think you are talking about setting the queue vnode attribute in the V2 config file, correct? I will consider that, but it is limited to only one queue as you mentioned, it means updating config files on nodes and restarting the MoM which is a config management challenge, and the vnode attribute table on page RG-325 says that is deprecated and I don’t want to build around something that might go away in the near future.
I do have a clarification question about Section 4.9.2 (the process discussed above). That process doesn’t mention setting resources_available on the queue (it does on the node, but not on the queue). What is the purpose of being able to set resources_available on a queue? Is that for restricting / consuming software licenses or such at the queue level?
I am struggling a bit to make my brain think about this the right way. It seems so logical to me to say set queue resources_available=foo,bar,baz and then the queue is limited to only running on nodes with that have those resources defined.
Apologies if I am being pig-headed about this. I do appreciate everyone’s time.
I am talking about the vnode ‘queue’ attribute. The config v2 file is just one of the places you can set the queue attribute. You can set it via qmgr as well. One of our customers has a setup where they move nodes in and out of queues based on demand. They do this by submitting node moving jobs to PBS which move nodes via qmgr.
This doesn’t allow jobs from multiple queues run on these nodes. It’s a 1:1 mapping between jobs and vnodes.
If you want multiple queues to share the nodes, you will have to go the partition/multisched route. You can set the partition attribute on multiple queues and nodes. You then create a scheduler via qmgr (qmgr -c ‘c sched ’) with the same partition attribute. You then start a multisched by /opt/pbs/sbin/pbs_sched -I (that’s a cap i as in id).
resources_available on the queue doesn’t really have much of a place any more. A long time back, it’d serve as a way to limit resources being run. You could set a resources_available.ncpus=5 and only 5 cpus would run in the queue. You are suggesting non-consumables. It’ll work the same way. The thing is that jobs reside in queues. If you request a value that isn’t in the resources_available, that job will never run (unless it moves queues). For resources_available on queues (and the server), you request resources via -lres=value. These are usually job wide resources like licenses. The reason I say these don’t have much place any more because we have a much more flexible limit framework. You can set limits on named users, groups, projects, the generic user, or PBS_ALL for the entire queue. There is little difference between setting resources_available.ncpus=5 and max_run_res.ncpus=[o:PBS_ALL=5]'. The thing is if you went the limit way, you could do max_run_res.ncpus=[u:bob=3, o:PBS_ALL=5]. This would limit bob to only being able to run 3 cpus, and every to only being able to run 5. The limit framework only works with consumable resources.
Now if you want to limit what jobs can get IN a queue, then you set resources_min and resources_max on resources. If the resource is consumable, the values are the min/max range that are allowed in the queue. If the value is a string, then both min/max have to be set to the same thing. It’ll make sure that only jobs with that resource value are accepted into the queue. You can combine this with resources_default to pick up that value when submitted into the queue without requesting that resource. Resources_min/max are usually used in combination with routing queues. You set resources_min/max on several queues to distinct ranges. You then set route_destinations on a routing queue to those queues. Then when you submit a job into the routing queue, it’ll be routed based on the min/max criteria. Once again, these are resources submitted via -l (qsub -llic=5). If you want resources that are submitted through the select, then set the ‘q’ flag on the resource. This will cause the resource to be summed over all of the chunks, and have a Resource_List.foo= set.
You are right, if you set resources_available.foo=bar, only jobs requesting qsub -lfoo=bar will run in that queue. A job qsub -lfoo=baz will never run unless it is moved to a different queue. I am not sure this is what you want. If you do want this, I’d use resources_min/max to only allow jobs that look like what you want into the queue.