Thanks @adarsh, unfortunately i cannot disable fairshare.
I have the following jobs currently running:
905215.pbs01.p* user01 queue bash01 636251 1 26 100gb 24:00 R 19:36
906310.pbs01.p* user02 queue run01 946016 1 32 -- 24:00 R 11:11
906312.pbs01.p* user02 queue run02 11452* 1 48 -- 24:00 R 06:05
These are the scores assigned by the Formula Evaluation:
01/29/2025 10:30:29;0100;pbs_sched;Job;903643.pbs01.cluster.lan;Formula Evaluation = 110.2
01/29/2025 10:30:29;0100;pbs_sched;Job;903644.pbs01.cluster.lan;Formula Evaluation = 110.2
01/29/2025 10:30:29;0100;pbs_sched;Job;903645.pbs01.cluster.lan;Formula Evaluation = 110.2
01/29/2025 10:30:29;0100;pbs_sched;Job;903646.pbs01.cluster.lan;Formula Evaluation = 111.1266
01/29/2025 10:30:29;0100;pbs_sched;Job;903647.pbs01.cluster.lan;Formula Evaluation = 111.1266
01/29/2025 10:30:29;0100;pbs_sched;Job;903648.pbs01.cluster.lan;Formula Evaluation = 111
01/29/2025 10:30:29;0100;pbs_sched;Job;905215.pbs01.cluster.lan;Formula Evaluation = 60.0417
01/29/2025 10:30:29;0100;pbs_sched;Job;905216.pbs01.cluster.lan;Formula Evaluation = 60.1683
01/29/2025 10:30:29;0100;pbs_sched;Job;905815.pbs01.cluster.lan;Formula Evaluation = 60.1683
01/29/2025 10:30:29;0100;pbs_sched;Job;905817.pbs01.cluster.lan;Formula Evaluation = 60.1683
01/29/2025 10:30:29;0100;pbs_sched;Job;905869.pbs01.cluster.lan;Formula Evaluation = 60.1683
01/29/2025 10:30:29;0100;pbs_sched;Job;905870.pbs01.cluster.lan;Formula Evaluation = 60.1683
01/29/2025 10:30:29;0100;pbs_sched;Job;906310.pbs01.cluster.lan;Formula Evaluation = 60.0539
01/29/2025 10:30:29;0100;pbs_sched;Job;906312.pbs01.cluster.lan;Formula Evaluation = 60.0931
Based on these scores, I expected job 903646 to be the next one scheduled for execution.
My assumption was that the scheduler would wait for job 906312 to complete, at which point jobs 905215 and 906310 would have also finished. Then, the scheduler should have selected one of the highest-priority jobs, either 903646 or 903647, to start. I expected one of these jobs to begin at approximately 04:30 on 01/30/2025.
However, this did not happen. Instead, the scheduler started newly submitted jobs in the meantime:
905215.pbs01.p* user01 queue bash01 636251 1 26 100gb 24:00 R 21:00
906312.pbs01.p* user02 queue run02 11452* 1 48 -- 24:00 R 07:30
906467.pbs01.p* user03 queue test 14287* 1 32 -- 24:00 R 00:02
I attach the job info:
Job Id: 903646.pbs01.cluster.lan
Job_Name = {red}
Job_Owner = user03@fe01
job_state = Q
queue = queue
server = pbs01.cluster.lan
Checkpoint = u
ctime = Thu Jan 23 17:05:37 2025
Hold_Types = n
Join_Path = oe
Keep_Files = n
Mail_Points = a
mtime = Tue Jan 28 18:17:51 2025
Priority = 50
qtime = Thu Jan 23 17:05:37 2025
Rerunable = False
Resource_List.ncpus = 128
Resource_List.ngpus = 1
Resource_List.nodect = 1
Resource_List.place = free
Resource_List.preempt_targets = none
Resource_List.select = 1:ncpus=128
Resource_List.walltime = 01:00:00
substate = 10
comment = Not Running: Insufficient amount of resource: queue_list
etime = Thu Jan 23 17:05:37 2025
eligible_time = 19:41:17
Submit_arguments = {redacted}.sh
project = _pbs_project_default
Submit_Host = fe01
Also, consider that the user who submitted the jobs (903646,903647) has no score on PBS fairshare (PBSFS) since they have been waiting for a long time.
[root@pbs01]# pbsfs -g user03
Fairshare Entity user03 does not exist.
Could you help me understand why the scheduler made this decision?
Maybe the scheduler’s plan is not working as expected when a job finishes before its declared walltime?
Is there a way to ensure that higher-priority jobs are scheduled as expected?
Thanks in advance for you time.