Let’s say a user submits 5 Jobs back-to-back and My PBS scheduling cycle is 30 sec.
Now All 5 jobs are in Q state . How to restrict only top 2 queued jobs from that user in priority order are evaluated for sending them to Run state (Assuming resources are available for all 5 queued jobs in next scheduling cycle) .
Have you looked at the max_run parameter? You could set a limit of 2 jobs running per user with
qmgr -c 'set server max_run="[u:PBS_GENERIC=2]"'
If that is not what you want, what is the higher-level goal?
Hi dtalcott,
Thanks for the quick response .
However our requirement is bit different
We have configured server_dyn_res: “VCPU_AVAIL !/opt/pbs/sbin/free_vcpus.sh” to limit the number cloud instances .The Scheduler evaluates the FREE_VCPU dynamic resource in every scheduling cycle(i.e 30 sec) and instead of examining job by job ,scheduler sends all the jobs to Run state those passed through the VCPU evaluation criteria which causes VCPU quota limit . Following Test case scenario shows how to reproduce the issue.
Where VCPU = 2 X NCPU (considering HT on)
*Step1: Edited the /opt/pbs/sbin/free_vcpus.sh to show Avilable free VCPUs to show max capping as 74 *
$ /opt/pbs/sbin/free_vcpus.sh
74
Step-2: Submit a 48 core (i.e 96 VCPU) Job .It should wait in Q as expected.
$ qstat -aws|grep -A1 parallel
3079.ip-10-77-162-79 testuser1 parallel CFX – 48 48 – 05:00 Q –
Can Never Run: Insufficient amount of server resource: VCPU_AVAIL (R: 96 A: 76 T: 76)
Step-3: Then Submit 2 consecutive 24 core (i.e 48 VCPU) Jobs . I am expecting 1 should go to Running state and another should wait . But both are going to Running state to initiate instance creation and there by causing over consumption VCPU limit
$ qstat -aws|grep -A1 parallel
3079.ip-10-77-162-79 testuser1 parallel CFX – 48 48 – 05:00 Q –
Can Never Run: Insufficient amount of server resource: VCPU_AVAIL (R: 96 A: 76 T: 76)
3080.ip-10-77-162-79 testuser1 parallel CFX – 24 24 – 15:00 R –
3081.ip-10-77-162-79 testuser1 parallel CFX – 24 24 – 15:00 R –
2nd Requirement : Although I have set backfill_depth =2 for the queue “parallel” .I am expecting that the scheduler should first reserve the resources top 2 jobs in order before sending lower priority jobs to run state which is not happening in this case (i.e Block all other low priority jobs unless they don’t affect the execution of top 2 jobs). Please note the wall time set is 5 Hrs for first job and other second and third are 15 hours.
$ qmgr -c "print queue parallel "|grep depth
set queue parallel backfill_depth = 2
Let me know if we can tune some of the parameters to meet this requirement.
+1 @dtalcott has suggested , there are other limits which can be applied
The below might help:
-
Try using static server-level resources , instead of server_dyn_res ( to avoid race condition or runjob hook can be handy), however you will still have jobs starting the same time
-5.14.3.2 Static Server-level Resources from https://help.altair.com/2024.1.0/PBS%20Professional/PBS2024.1.pdf -
the provisioning hook, if it helps your situation
16.1 Introduction from https://help.altair.com/2024.1.0/PBS%20Professional/PBS2024.1.pdf
You can use job_sort_formula_threshold , to ignore the jobs that are below the threshold.
What about the 2nd Requirement: backfill_depth = 2 which is not honored at all.
backfill_depth is to find the number top jobs that can be filled around the calendared jobs and would not affect the start time of these calendered jobs. The scheduler does not reserve these jobs and its resources, as the situation might vary and in the next scheduling cycle based on the other jobs in the queue, fresh jobs, some other jobs can be a candidate for backfilling. The landscape might change for the backfillable jobs, but the calendared jobs remain intact.
backfill_depth is used to make smaller jobs run between large jobs without affecting the start time of larger jobs, otherwise, smaller jobs gets backfilled and will start to push the large jobs way longer in the future and large jobs would not get time to run.
As per my understanding in order to backfill work properly (i.e to make smaller jobs run between large jobs without affecting the start time of larger jobs) the scheduler should be able to estimate the approximate start time of the job which can be viewed using “qstat -T” command .
In my Case the “Est start Time” column showing blank, and the reason could be the scheduler is not aware when approximately the server_dyn_res: “VCPU_AVAIL !/opt/pbs/sbin/free_vcpus.sh” will be available.
So, it is better to disable using backfill_depth = 0 and let the the jobs run based on priority without backfilling.
Please correct me if I am wrong…
I think the problem with backfill is that it looks at only consumable or node-level resources; that is, things that can be reserved.
As adarsh suggested, you might try using a static server level consumable resource whose resources_available value is adjusted by a periodic hook to match the number of VCPUs defined by the current cloud resources.