Job array with pbspro

Hi,

pbspro seems to treat a job array as a single job when attributes max_queued and queued_jobs_threshold applied. are there any ways to treat each subjob of job array as one job so that max_queued or queued_jobs_threshold can be applied to limit number of subjobs queuing? open source scheduler maui has this wonderful function.

thanks,

Sue

@sxy Could you please state the PBS Pro version you are running ?

V14.0.2

Thanks.

Sue

The newer versions of PBS do treat subjobs as normal jobs for limits:

Qmgr: s s max_queued="[o:PBS_ALL=5]"
[ravi@pbspro ~]$ qsub -J 1-50 -- /bin/sleep 100
qsub: Maximum number of jobs already in complex
[ravi@pbspro ~]$ qsub -J 1-4 -- /bin/sleep 100
53[].pbspro
1 Like

Qmgr: s s max_queued="[o:PBS_ALL=5]"
[ravi@pbspro ~]$ qsub -J 1-50 – /bin/sleep 100
qsub: Maximum number of jobs already in complex

can you run this command to show jobs’ status please,

[ravi@pbspro ~]$ qstat -1nt

thanks,

Sue

Output:

[ravi@pbspro ~]$ qsub -J 1-50 -- /bin/sleep 100
qsub: Maximum number of jobs already in complex
[ravi@pbspro ~]$ qstat -1nt
[ravi@pbspro ~]$ qstat -f
[ravi@pbspro ~]$ 
[ravi@pbspro ~]$ 
[ravi@pbspro ~]$

we have a routing queue as such

Queue defaultQ
queue_type = Route
total_jobs = 0
state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0 Begun:0
route_destinations = physics
enabled = True
started = True

execution queue, physics set as such,

Queue physics
queue_type = Execution
Priority = 10
total_jobs = 0
state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0 Begun:0
max_queued = [u:PBS_GENERIC=10]
max_queued = [u:sxy=10]
max_queued = [u:test=10]
default_chunk.Qlist = physics
resources_assigned.ncpus = 0
resources_assigned.nodect = 0
enabled = True
started = True
queued_jobs_threshold = [u:PBS_GENERIC=10]
queued_jobs_threshold = [u:sxy=10]
queued_jobs_threshold = [u:test=10]

we have 72 cores on queue, physics. with pbspro 14.0.1, if I run this

$qsub -J 1-200 – /bin/sleep 100
$qsub -J 1-200 – /bin/sleep 100
$qstat -1n


365[].headnode sxy physics STDIN – 1 1 – – B – –
366[].headnode sxy defaultQ STDIN – 1 1 – – Q – –

for job 365[], 72 subjobs are running and others are queuing on queue, physics.
I would expect that 10 subjobs were queuing in physics queue while others are queuing in defaultQ.
also, job 366[] is waiting in queue, defaultQ until all 365 subjobs completes even there are free cores. then all subjobs entered into queue, physics at once.

could you test this scenario with version you run please?

nowdays, running job array is very popular on HPC systems. if one user submits a job array with large numbers of subjobs, perhaps thousands, it would cause"queue stuffing". we could use fairshare to limit jobs running for each user, but there would be overheads.

Thanks,

Sue

in your case, if you run

$ qsub – /bin/sleep 100

perhaps, you could also get this?

qsub: Maximum number of jobs already in complex

Sue

Ok, i tried to create a similar setup:

[ravi@pbspro ~]$ qstat -Qf
Queue: physics
    queue_type = Execution
    Priority = 10
    total_jobs = 0
    state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0 Begun
	:0 
    max_queued = [u:PBS_GENERIC=10]
    max_queued = [u:ravi=10]
    enabled = True
    started = True

Queue: defaultQ
    queue_type = Route
    total_jobs = 2
    state_count = Transit:0 Queued:2 Held:0 Waiting:0 Running:0 Exiting:0 Begun
	:0 
    route_destinations = physics
    enabled = True
    started = True

Queue: defaultQ
    queue_type = Route
    total_jobs = 0
    state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0 Begun
	:0 
    route_destinations = physics
    enabled = True
    started = True

Submitted 2 jobs as user 'ravi':
[ravi@pbspro ~]$ qsub -J 1-200 -q defaultQ -- /bin/sleep 1000
60[].pbspro
[ravi@pbspro ~]$ qsub -J 1-200 -q defaultQ -- /bin/sleep 1000
61[].pbspro

They were both queued in defaultQ:
[ravi@pbspro ~]$ qstat -1n

pbspro: 
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
60[].pbspro     ravi     defaultQ STDIN         --    1   1    --    --  Q   --   -- 
61[].pbspro     ravi     defaultQ STDIN         --    1   1    --    --  Q   --   -- 

So, I guess PBS doesn’t allow an array job to be queued into the execution queue unless it can completely fit it, so they both stay in the routing queue. If I submit a smaller job from the same user, that does routed to physics and starts running:

[ravi@pbspro ~]$ qsub -J 1-5 -q defaultQ -- /bin/sleep 1000
62[].pbspro
[ravi@pbspro ~]$ qstat -1n

pbspro: 
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
60[].pbspro     ravi     defaultQ STDIN         --    1   1    --    --  Q   --   -- 
61[].pbspro     ravi     defaultQ STDIN         --    1   1    --    --  Q   --   -- 
62[].pbspro     ravi     physics  STDIN         --    1   1    --    --  B   --   --

[quote=“agrawalravi90, post:4, topic:2079”

job 60[] and 61[] above will never run? I would expect that 5 subjobs routed to queue physics would run. we strongly suggest the future pbspro would include such function for subjobs in job array as more and more users are running job array on HPC.
Apparently this function has many advantages such as speeding up the scheduling cycle in execution queues and preventing single user of stuffing execution queues.
open source scheduler maui has this function for many years and I am surprised that pbspro has not adopted it yet.

thanks,

Sue

Well, they cross the max_queued limit, so ya, they won’t ever get queued to run. Would it be unreasonable to ask your users to submit smaller job arrays? As long as users submit arrayjobs within the limits, your use case will be achieved.

But thanks for mentioning this, we can analyze further whether it makes sense to enhance job arrays in PBS to be able to queue some of it into the execution queue if max_queued is set.

fundamental idea is that an array job as a subjob in job array should be treated equally as a normal job so that any functions with pbspro apply to normal jobs should also apply to array jobs. could you please pass my comments under this topic to your manager?

thanks,

Sue

I think the new version of PBS does treat subjobs pretty much the same way as normal jobs. Can you please be more specific about what differences you see between a subjob and a normal job in PBS? The limits thing, as I explained earlier, has been fixed in the latest versions of PBS. Is there any other difference that you are concerned about?

1 Like

Did you use the latest version in your example in our conversions?

Sue

I used the master branch in the examples above.

@sxy

If you are using 14.X then you might face the issue and it might not work the way you like.
If you download 19.x or the latest from the master as @agrawalravi90 suggested, that would work the way you like .

The behaviour you have seen in 14.x is a bug and it is fixed in version 19.x or the master branch.

Hi,

The newer versions of PBS do treat subjobs as normal jobs for limits:

Qmgr: s s max_queued="[o:PBS_ALL=5]"
[ravi@pbspro ~]$ qsub -J 1-50 -- /bin/sleep 100
qsub: Maximum number of jobs already in complex

in your case, no jobs ran. but I would expect first 5 jobs should start to run if subjobs are treated in the same way as normal jobs.

thanks,

Sue

Hi,
Please check the admin guide : https://www.altair.com/pdfs/pbsworks/PBS19.2.3_BigBook.pdf on section
Table 4-1: Server Attributes Involved in Scheduling
max_queued: The maximum number of jobs allowed to be queued or running in the partition(s) managed by a scheduler. Can be specified for users, groups, or all.

Thank you

A job array isn’t a collection of jobs. It’s one “job” that can spawn many jobs from it. This means that a subjob doesn’t exist until it starts running. We can’t detach N subjobs from the parent to move into the exec queue. They don’t exist yet. The whole job array has to move.

With this limitation, fixing it as you suggest is quite difficult.

I am not fully up to date on job arrays though. This might be easier than I think. Hopefully our job array expert (@Shrini-h) can weigh in.

Bhroam

1 Like

Sorry for the delay in replying

I think @sxy has a valid expectation, unfortunately the current architecture doesn’t allow it.

I also agree with @Bhroam that its not an easy enhancement, but will be an interesting one to solve. there could be many ways to architect 1. decoupling parent job from queue (make it sort of global) and let subjobs freely move around queues, 2. Routing job array can spawn smaller dependent job arrays at the execution queue… etc

Thank You