Is it possible to change the preemption order (or, for that matter, disable one of the methods) for a specific queue? Say, I have a dedicated “gpu” queue where the “suspend” method makes no sense, but is fine for “normal”, CPU-based jobs on other queues.
Thanks, but it doesn’t help. I don’t want to disable preemption entirely, just remove the “suspend” method for all jobs in a given queue (preemption through checkpointing and killing should remain). For other queues, suspending would remain a valid option.
It might be overkill, but you could use the “multisched” feature to run a separate scheduler just for your GPU queues. This second scheduler could be configured without suspending as an option. See section 4.2 of the Admin Guide.
Thanks for the suggestion, @dtalcott. It indeed sounds like overkill, but probably that is the only way. Strange, all that is missing is an “on_preemption” hook event, but it seems there is nothing like that…
Am I the only one with such a request? How do other people deal with preemption in mixed CPU/GPU environments?
Would it be legitimate to have a routing queue having two destinations on different partitions, managed by two different schedulers? I cannot find a definite answer in the docs.
You can set up a routing queue with multiple destinations on different partitions. However, normal behavior is (per Admin Guide 2.3.6): “Tries destinations in round-robin fashion, in the order listed” So, you would need your queues set up so queue A would reject jobs you want to run on queue B, and vice versa.
Instead of this, I would write a queuejob hook that examines the job at qsub time and moves it to the desired queue. (Hooks Guide, 2.4.1 and examples 9-6 and 9-8)
+1 @dtalcott and the below resource usage limit might help for the routing execution queues queued_jobs_threshold
The maximum number of jobs that can be queued. At the server level, this includes all jobs in the complex.
Queueing a job includes the qsub and qmove commands and the equivalent APIs. queued_jobs_threshold_res.
The maximum amount of the specified resource that can be allocated to queued jobs. At the server level, this
includes all jobs in the complex. Queueing a job includes the qsub and qmove commands and the equivalent
APIs.
Thanks for the clarifications. The round-robin routing is fine. I have resources_max.ngpus = 0 in the non-GPU queues and resources_min.ngpus = 1 in the GPU one, so the jobs are always routed correctly - at least now, with the single partition.