Dedicated queue question

Hi All,

I want to prevent jobs from running on a system for some maintenance to take place. There are 24 hour walltime limits on the queues on the system.

From looking at the docs I think I can either use a reservation, or a dedicated time. I think a dedicated time is the way to go.

So if I want the system down for a full day, and there is a queue limit of 24 hours, do I need to have a two day dedicated time, in order to drain the jobs?

So e.g. its now 9:19 on the 7th of Dec - the following would have the running jobs drained by tomorrow at 10:00, and no jobs would be able to run unless in a dedicated queue until 00:00 on the 9th:

12/7/2021 10:00 12/9/2001 00:00

Thanks!

No jobs can start if they would conflict with a dedicated time. Already running jobs ignore dedicated times.

So, in your example, you would create the dedicated time entry to exactly match when you want your dedicated jobs to start/run. Then, so long as you create the entry at least 24 hours ahead of time, the non-dedicated jobs will have idled out by the start of dedicated time.

When the dedicated time arrives, the PBS server will start the ded queues automatically. When the ded time ends, PBS stops the queues. (I don’t remember whether the end time is a hard stop for ded jobs, or if ded jobs that have already started can continue to run.)

Thanks @dtalcott

I have added the following to my dedicated time file after the example:

For example

04/15/1998 12:00 04/15/1998 15:30

4/15/1998 16:00 4/15/1998 16:40

08/12/2021 00:01 10/12/2021 00:01

But jobs have started within the window. Is there something else I need to do in order to prevent them from running?

All I have done is add the entry to the dedicated_time file and HUP’d the scheduler.

The jobs that were submitted earlier before enablint the dedicated_time would keep on running during the dedicated time period, these jobs would not be killed. But any jobs submitted after enabling dedicated_time (kill -HUP pid_of_pbs_sched) and crossing the dedicated time boundary will be kept in the queued state.

Example:

Job ID                         Username        Queue           Jobname         SessID   NDS  TSK   Memory Time  S Time
------------------------------ --------------- --------------- --------------- -------- ---- ----- ------ ----- - -----
4341.pbsserver                pbsdata         workq           STDIN              43173    1     1    --    --  R 00:01
   pbsserver/0
   Job run at Wed Dec 08 at 11:20 on (pbsserver:ncpus=1)
4342.pbsserver                pbsdata         workq           STDIN                --     1     1    --    --  Q   -- 
    -- 
   Not Running: Job would cross dedicated time boundary

Thanks @adarsh - I have realised my mistake. The date format is mm/dd/yyyy, not as I had written: dd/mm/yyyy.

It’s working now.

Cheers!

1 Like

I have tried to set up a queue to use this dedicated time window and run some tests, but i get the following error when submitting a job:
qsub: Unauthorized request

Looking at the logs I get:

12/10/2021 11:32:31;0006;Server@sdb;Hook;Server@sdb;gw_job_setup as queuejob for ba-corourke@nid00004 for new job: started
12/10/2021 11:32:31;0006;Server@sdb;Hook;Server@sdb;set _job.sandbox = private
12/10/2021 11:32:31;0006;Server@sdb;Hook;Server@sdb;set _job.Account_Name = 'GW02'
12/10/2021 11:32:31;0006;Server@sdb;Hook;Server@sdb;set _job.project = GW02
12/10/2021 11:32:31;0006;Server@sdb;Hook;Server@sdb;set job.Resource_List[group_proportion] = -1.0
12/10/2021 11:32:31;0006;Server@sdb;Hook;Server@sdb;set _job.umask = 22L
12/10/2021 11:32:31;0006;Server@sdb;Hook;Server@sdb;gw_job_setup as queuejob for ba-corourke@nid00004 for new job: completed after 0.0012s
12/10/2021 11:32:31;0006;Server@sdb;Hook;Server@sdb;job_default_coretype as queuejob for ba-corourke@nid00004 for new job: started
12/10/2021 11:32:31;0006;Server@sdb;Hook;Server@sdb;cluster is isambard
12/10/2021 11:32:31;0006;Server@sdb;Hook;Server@sdb;original select = 1
12/10/2021 11:32:31;0006;Server@sdb;Hook;Server@sdb;updated select = 1:coretype=arm
12/10/2021 11:32:31;0006;Server@sdb;Hook;Server@sdb;original select = 1:coretype=arm
12/10/2021 11:32:31;0006;Server@sdb;Hook;Server@sdb;updated select = 1:ncpus=64:coretype=arm:mpiprocs=64
12/10/2021 11:32:31;0006;Server@sdb;Hook;Server@sdb;job_default_coretype as queuejob for ba-corourke@nid00004 for new job: completed after 0.0009s
12/10/2021 11:32:31;0006;Server@sdb;Hook;Server@sdb;extra_walltime_limits as queuejob for ba-corourke@nid00004 for new job: started
12/10/2021 11:32:31;0006;Server@sdb;Hook;Server@sdb;soft limits not defined for ded-test queue
12/10/2021 11:32:31;0006;Server@sdb;Hook;Server@sdb;no soft limit defined for job
12/10/2021 11:32:31;0006;Server@sdb;Hook;Server@sdb;extra_walltime_limits as queuejob for ba-corourke@nid00004 for new job: completed after 0.0031s
12/10/2021 11:32:31;0006;Server@sdb;Hook;Server@sdb;set _job.project = GW02
12/10/2021 11:32:31;0080;Server@sdb;Req;req_reject;Reject reply code=15007, aux=0, type=1, from ba-corourke@nid00004

Does any one know how to fix this problem?

I have enabled acl_groups and acl_users and added myself to both.

Please share the queue configuration of that dedicated queue ? and the sched_config and dedicated_time file

Section 4.9.10, “Dedicated Time”, on p. AG-127 of the PBS Professional Administrator’s Guide may help.

I figured out. the problem - the queue name I gave had a hyphen in it, which seemed to cause the issue! Works now without the hyphen.