Changing preemption order for a queue

fnevgeny · August 25, 2021, 9:22am

Hi all,

Is it possible to change the preemption order (or, for that matter, disable one of the methods) for a specific queue? Say, I have a dedicated “gpu” queue where the “suspend” method makes no sense, but is fine for “normal”, CPU-based jobs on other queues.

adarsh · August 25, 2021, 1:33pm

Please check “preept_targets” from this guide https://www.altair.com/pdfs/pbsworks/PBSAdminGuide2021.1.pdf and also section “4.3.5.8 Preventing Jobs from Being Preempted”

example:

qmgr -c "s q highpriorityqueue resources_default.preempt_targets=queue=lowpriorityq"

Jobs in highpriorityqueue or an express queue can only preempt jobs from the lowpriorityq queue.

fnevgeny · August 25, 2021, 2:07pm

Thanks, but it doesn’t help. I don’t want to disable preemption entirely, just remove the “suspend” method for all jobs in a given queue (preemption through checkpointing and killing should remain). For other queues, suspending would remain a valid option.

dtalcott · August 25, 2021, 4:27pm

It might be overkill, but you could use the “multisched” feature to run a separate scheduler just for your GPU queues. This second scheduler could be configured without suspending as an option. See section 4.2 of the Admin Guide.

fnevgeny · August 25, 2021, 5:45pm

Thanks for the suggestion, @dtalcott. It indeed sounds like overkill, but probably that is the only way. Strange, all that is missing is an “on_preemption” hook event, but it seems there is nothing like that…

Am I the only one with such a request? How do other people deal with preemption in mixed CPU/GPU environments?

fnevgeny · August 26, 2021, 3:38pm

Would it be legitimate to have a routing queue having two destinations on different partitions, managed by two different schedulers? I cannot find a definite answer in the docs.

dtalcott · September 2, 2021, 4:21am

You can set up a routing queue with multiple destinations on different partitions. However, normal behavior is (per Admin Guide 2.3.6): “Tries destinations in round-robin fashion, in the order listed” So, you would need your queues set up so queue A would reject jobs you want to run on queue B, and vice versa.

Instead of this, I would write a queuejob hook that examines the job at qsub time and moves it to the desired queue. (Hooks Guide, 2.4.1 and examples 9-6 and 9-8)

adarsh · September 2, 2021, 9:27am

+1 @dtalcott and the below resource usage limit might help for the routing execution queues
queued_jobs_threshold
The maximum number of jobs that can be queued. At the server level, this includes all jobs in the complex.
Queueing a job includes the qsub and qmove commands and the equivalent APIs.
queued_jobs_threshold_res.
The maximum amount of the specified resource that can be allocated to queued jobs. At the server level, this
includes all jobs in the complex. Queueing a job includes the qsub and qmove commands and the equivalent
APIs.

fnevgeny · September 2, 2021, 9:43am

Thanks for the clarifications. The round-robin routing is fine. I have resources_max.ngpus = 0 in the non-GPU queues and resources_min.ngpus = 1 in the GPU one, so the jobs are always routed correctly - at least now, with the single partition.

Topic		Replies	Views
Provide Priority on high priority jobs when resources are fully occupied Users/Site Administrators	1	409	July 28, 2023
Job resumption after preemption suspension oddity Users/Site Administrators	1	528	January 23, 2019
Job suspension and host/consumable resources Users/Site Administrators	3	661	February 10, 2020
Preempted jobs are not put back to execution as expected Users/Site Administrators	0	256	February 19, 2023
[Preempt queue] Jobs in queue1 should stop to let jobs in queue2 start Users/Site Administrators	1	1630	November 28, 2017

Changing preemption order for a queue

Related topics