I have two queues, a small queue and a long queue.
The long queue has nodes with more cores and memory, while the small queue has nodes with less cores and memory.
I want to do the following:
Restrict usage of small nodes to only small jobs during work hours
Allow usage of small nodes by long jobs during non work hours
Restrict number of small jobs per user on small queue during work hours
Restrict number of long jobs per user on small queue during non work hours
Primetime may work for this but I want jobs to run all time, I.E jobs should be able to run 24/7, as users may have scripts that run overnight and spawn new jobs so use primetime/non-primetime to prevent jobs from running would not be ideal.
An easy method would be to schedule qmrg commands at the start and end of work hours using cron.
At the start of the work day cron would run:
qmgr -c “set queue smallq resources_max.mem = 32gb”
qmgr -c “set queue smallq resources_max.ncpus = 4”
At the end of the work day cron would run:
qmgr -c “set queue smallq resources_max.mem = 256gb”
qmgr -c “set queue smallq resources_max.ncpus = 32”
set queue smallq resources_max.walltime = 12:00:00
Existing small jobs will continue to run. Big jobs can then start. But after 12 hours they will end and small jobs can start shuffling in.
To our understanding , could you please explain what do you mean by small jobs (with a qsub request example) and same for long jobs. This will help . Thank you
accept small jobs to smallq during work hours and reject long jobs submitted to small queue ( job requesting more than 1 ncpu and more than 2GB memory per job ) with a message mentioning submit to long queue , as it is working hours.
use queue limits for restricting the number of jobs per user to small queue or long (queue job hook maintains the working hours , as it rejects long / short jobs during working/non-working hours respectively) . Refer: How to assign maximum cores to particular queue? - #2 by adarsh