Allowing queues to use nodes based on time

I have two queues, a small queue and a long queue.

The long queue has nodes with more cores and memory, while the small queue has nodes with less cores and memory.

I want to do the following:

  • Restrict usage of small nodes to only small jobs during work hours
  • Allow usage of small nodes by long jobs during non work hours
  • Restrict number of small jobs per user on small queue during work hours
  • Restrict number of long jobs per user on small queue during non work hours

Primetime may work for this but I want jobs to run all time, I.E jobs should be able to run 24/7, as users may have scripts that run overnight and spawn new jobs so use primetime/non-primetime to prevent jobs from running would not be ideal.

An easy method would be to schedule qmrg commands at the start and end of work hours using cron.

At the start of the work day cron would run:
qmgr -c “set queue smallq resources_max.mem = 32gb”
qmgr -c “set queue smallq resources_max.ncpus = 4”

At the end of the work day cron would run:
qmgr -c “set queue smallq resources_max.mem = 256gb”
qmgr -c “set queue smallq resources_max.ncpus = 32”
set queue smallq resources_max.walltime = 12:00:00

Existing small jobs will continue to run. Big jobs can then start. But after 12 hours they will end and small jobs can start shuffling in.

Mike

To our understanding , could you please explain what do you mean by small jobs (with a qsub request example) and same for long jobs. This will help . Thank you

Sure. For example some teams run jobs with 8+ cores and 50+GB of ram like so:

qsub -q workq -V -l select=2:ncpus=8:mem=80000M

These will run for several days. I haven’t informed my users about walltime yet (although I am planning to with these changes).

Smaller jobs would be things like github runners:

qsub -q workq -l select=1:ncpus=1:mem=2000M

Very interesting, didn’t think of using qmgr in combination with cron like that. Thanks!

Implement qlist of small nodes to smallq and long nodes to longq
Ref: Node-level resource not taking effect - #2 by adarsh

You can write a queue job hook that will

  1. accept small jobs to smallq during work hours and reject long jobs submitted to small queue ( job requesting more than 1 ncpu and more than 2GB memory per job ) with a message mentioning submit to long queue , as it is working hours.
  2. use queue limits for restricting the number of jobs per user to small queue or long (queue job hook maintains the working hours , as it rejects long / short jobs during working/non-working hours respectively) . Refer: How to assign maximum cores to particular queue? - #2 by adarsh