Job_sort_formula: wrong terms

matzmz · January 28, 2025, 5:05pm

Hi
I need to create a “job_sort_formula” that prioritizes jobs based on the following policy:

Jobs that have been in the queue for a long time should have higher priority (keeping in mind also the job prio). Specifically, I want to calculate the priority as:

(now() - qtime) → This term indicates how long the job has been in the queue in seconds.

However, I would like to scale this time to hours:

(now() - qtime) / 3600

Then, I want to multiply it by the job’s “priority”:

priority * ((now() - qtime) / 3600)

I believe the issue might be with the (now() - qtime) term.

urrently, I am using the following formula:

set server job_sort_formula = "((1.0 / (walltime/3600)) + job_priority) * eligible_time * fairshare_perc"

To incorporate something similar to (now() - qtime), I tried adding eligible_time + ineligible_time, but when I do so, I encounter the error:

Formula contains invalid keyword

Could someone clarify which attributes are valid for use in the “job_sort_formula”? Specifically, how can I properly reference the time a job has spent in the queue (now() - qtime or similar) to achieve the desired prioritization?

Thanks in advance for your help!

adarsh · January 28, 2025, 6:58pm

The valid attributes are ncpus, cput, mem, walltime, custom numeric resources, queue_prirority , job_priority, eligible_time, fairshare_prec, fairshare_tree_usage, fairshare_factor.

ineligible_time is not a valid for job_sort_formula

Please refer this section : 6.15.3.11 Method to Create or Set job_sort_formula Object and Table 6-43: Methods Available in Job Events
Document: https://help.altair.com/2024.1.0/PBS%20Professional/PBS2024.1.pdf

matzmz · January 28, 2025, 7:09pm

Thanks for the reply.

It seems that job_sort_formula is not the solution to my problem.

Here’s the issue: I have a queue with a single node, which has 128 CPUs. Most users request only a few CPUs, but one user has submitted a job that requests 128 CPUs and has a small walltime. The problem is that this job has been waiting for weeks, while smaller jobs are being executed. The scheduler has fairshare enabled.

Do you have any suggestions on how to adress this issue?

adarsh · January 29, 2025, 8:43am

You can try this

Note: requesting walltime (correct walltime) is the key

enable strict ordering ( disable starving , fairshare policy , unset job sort formula)
set backfill_depth to 5
To cover the jobs that do not request walltime, set defaults ( walltime below is set to 10 hours)
qmgr -c ‘s s resources_default.walltime=10:00:00’

This would calender your top job and avoid pushing it down the time line by short jobs.

matzmz · January 29, 2025, 11:57am

Thanks @adarsh, unfortunately i cannot disable fairshare.

I have the following jobs currently running:

905215.pbs01.p* user01   queue  bash01     636251   1   26  100gb  24:00  R  19:36  
906310.pbs01.p* user02   queue  run01      946016   1   32  --    24:00  R  11:11  
906312.pbs01.p* user02   queue  run02      11452*   1   48  --    24:00  R  06:05

These are the scores assigned by the Formula Evaluation:

01/29/2025 10:30:29;0100;pbs_sched;Job;903643.pbs01.cluster.lan;Formula Evaluation = 110.2  
01/29/2025 10:30:29;0100;pbs_sched;Job;903644.pbs01.cluster.lan;Formula Evaluation = 110.2  
01/29/2025 10:30:29;0100;pbs_sched;Job;903645.pbs01.cluster.lan;Formula Evaluation = 110.2  
01/29/2025 10:30:29;0100;pbs_sched;Job;903646.pbs01.cluster.lan;Formula Evaluation = 111.1266  
01/29/2025 10:30:29;0100;pbs_sched;Job;903647.pbs01.cluster.lan;Formula Evaluation = 111.1266  
01/29/2025 10:30:29;0100;pbs_sched;Job;903648.pbs01.cluster.lan;Formula Evaluation = 111  
01/29/2025 10:30:29;0100;pbs_sched;Job;905215.pbs01.cluster.lan;Formula Evaluation = 60.0417  
01/29/2025 10:30:29;0100;pbs_sched;Job;905216.pbs01.cluster.lan;Formula Evaluation = 60.1683  
01/29/2025 10:30:29;0100;pbs_sched;Job;905815.pbs01.cluster.lan;Formula Evaluation = 60.1683  
01/29/2025 10:30:29;0100;pbs_sched;Job;905817.pbs01.cluster.lan;Formula Evaluation = 60.1683  
01/29/2025 10:30:29;0100;pbs_sched;Job;905869.pbs01.cluster.lan;Formula Evaluation = 60.1683  
01/29/2025 10:30:29;0100;pbs_sched;Job;905870.pbs01.cluster.lan;Formula Evaluation = 60.1683  
01/29/2025 10:30:29;0100;pbs_sched;Job;906310.pbs01.cluster.lan;Formula Evaluation = 60.0539  
01/29/2025 10:30:29;0100;pbs_sched;Job;906312.pbs01.cluster.lan;Formula Evaluation = 60.0931

Based on these scores, I expected job 903646 to be the next one scheduled for execution.

My assumption was that the scheduler would wait for job 906312 to complete, at which point jobs 905215 and 906310 would have also finished. Then, the scheduler should have selected one of the highest-priority jobs, either 903646 or 903647, to start. I expected one of these jobs to begin at approximately 04:30 on 01/30/2025.

However, this did not happen. Instead, the scheduler started newly submitted jobs in the meantime:

905215.pbs01.p* user01   queue  bash01     636251   1   26  100gb  24:00  R  21:00  
906312.pbs01.p* user02   queue  run02      11452*   1   48  --    24:00  R  07:30  
906467.pbs01.p* user03   queue  test       14287*   1   32  --    24:00  R  00:02

I attach the job info:

Job Id: 903646.pbs01.cluster.lan
    Job_Name = {red}
    Job_Owner = user03@fe01
    job_state = Q
    queue = queue
    server = pbs01.cluster.lan
    Checkpoint = u
    ctime = Thu Jan 23 17:05:37 2025
    Hold_Types = n
    Join_Path = oe
    Keep_Files = n
    Mail_Points = a
    mtime = Tue Jan 28 18:17:51 2025
    Priority = 50
    qtime = Thu Jan 23 17:05:37 2025
    Rerunable = False
    Resource_List.ncpus = 128
    Resource_List.ngpus = 1
    Resource_List.nodect = 1
    Resource_List.place = free
    Resource_List.preempt_targets = none
    Resource_List.select = 1:ncpus=128
    Resource_List.walltime = 01:00:00
    substate = 10
    comment = Not Running: Insufficient amount of resource: queue_list
    etime = Thu Jan 23 17:05:37 2025
    eligible_time = 19:41:17
    Submit_arguments = {redacted}.sh
    project = _pbs_project_default
    Submit_Host = fe01

Also, consider that the user who submitted the jobs (903646,903647) has no score on PBS fairshare (PBSFS) since they have been waiting for a long time.

[root@pbs01]# pbsfs -g user03
Fairshare Entity user03 does not exist.

Could you help me understand why the scheduler made this decision?
Maybe the scheduler’s plan is not working as expected when a job finishes before its declared walltime?
Is there a way to ensure that higher-priority jobs are scheduled as expected?

Thanks in advance for you time.

adarsh · January 29, 2025, 8:56pm

Would it be possible to share the sched_config file and qstat -Bf output

matzmz · January 29, 2025, 9:33pm

Server: pbs01.cluster.lan
    server_state = Active
    server_host = pbs01
    scheduling = True
    total_jobs = 50309
    state_count = Transit:0 Queued:46 Held:30 Waiting:0 Running:95 Exiting:0 Be
        gun:0
    default_queue = gpu
    log_events = 2047
    mailer = /usr/sbin/sendmail
    mail_from = adm
    query_other_jobs = True
    resources_default.ncpus = 1
    default_chunk.ncpus = 1
    resources_assigned.mem = 150gb
    resources_assigned.mpiprocs = 794
    resources_assigned.ncpus = 1868
    resources_assigned.nodect = 111
    scheduler_iteration = 60
    flatuid = True
    resv_enable = True
    node_fail_requeue = 310
    max_array_size = 10000
    node_group_enable = True
    node_group_key = ibswitch
    default_qsub_arguments = -r n
    pbs_license_min = 0
    pbs_license_max = 2147483647
    pbs_license_linger_time = 31536000
    license_count = Avail_Global:1000000 Avail_Local:1000000 Used:0 High_Use:0
    pbs_version = 22.05.11
    job_sort_formula = job_priority + (3 * queue_priority) + (1.0 / (walltime/3
        600.0)) + (min((eligible_time*1.0)/3600.0,128)/128.0)
    eligible_time_enable = True
    job_history_enable = True
    job_history_duration = 2160:00:00
    max_concurrent_provision = 5
    backfill_depth = 5
    python_restart_min_interval = 00:00:30
    max_job_sequence_id = 9999999

sched_priv:

round_robin: False      all
by_queue: True          prime
by_queue: True          non_prime
strict_ordering: false  ALL
backfill_prime: false   ALL
prime_exempt_anytime_queues:    false
primetime_prefix: p_
nonprimetime_prefix: np_
node_sort_key: "sort_priority HIGH"     ALL
provision_policy: "aggressive_provision"
sort_queues:    true    ALL
resources: "ncpus, mem, arch, host, vnode, aoe, eoe, queue_list, ngpus"
smp_cluster_dist: pack
fair_share: true ALL
unknown_shares: 10
fairshare_usage_res: "pow(cput,.75) + (ngpus * walltime)"
fairshare_entity: euser
fairshare_decay_time: 12:00:00
fairshare_decay_factor: 0.75
preemptive_sched: true  ALL
preempt_queue_prio:     150
preempt_prio: "starving_jobs, express_queue, normal_jobs, queue_softlimits"
preempt_order: "SCR"
preempt_sort: min_time_since_start
dedicated_prefix: ded

queue configuration:

create queue queue_name
set queue queue_name queue_type = Execution
set queue queue_name Priority = 20
set queue queue_name resources_max.ncpus = 128
set queue queue_name resources_max.ngpus = 8
set queue queue_name resources_max.walltime = 24:00:00
set queue queue_name resources_default.ncpus = 1
set queue queue_name resources_default.ngpus = 1
set queue queue_name resources_default.preempt_targets = none
set queue queue_name resources_default.walltime = 24:00:00
set queue queue_name default_chunk.queue_list = queue
set queue queue_name max_run = [u:PBS_GENERIC=2]
set queue queue_name max_run_soft = [u:PBS_GENERIC=1]
set queue queue_name backfill_depth = 3
set queue queue_name enabled = True
set queue queue_name started = True

adarsh · January 30, 2025, 8:38am

Thank you @matzmz . Please set the strict_ordering: true ALL and kill -HUP . This should take care of the calendering of the job. You have wallime defined on the queue.

matzmz · January 30, 2025, 10:44am

@adarsh First of all, thank you for the analysis!

I have a doubt, it’s not clear to me.

If I enable strict_ordering: true ALL, will fairshare still function correctly?
I enabled it to try and democratize cluster usage.

Thanks!

adarsh · January 30, 2025, 11:02am

This would not affect fairshare. Your fairshare is accounted in the job sort formula.
Refer this document: https://help.altair.com/2024.1.0/PBS%20Professional/PBS2024.1.pdf
section: 4.9.48.2 How Strict Ordering Works

matzmz · January 31, 2025, 3:02pm

@adarsh Thank you so much for your help! After a long time, I finally achieved the behavior I was looking for. I really appreciate it!

adarsh · January 31, 2025, 4:48pm

Nice one ! Thank you.

Topic		Replies	Views
Job_sort_formula examples Users/Site Administrators	2	545	June 23, 2023
Fairshare with queue priority Users/Site Administrators	11	4972	October 30, 2017
Making 'job_sort_formula' a sched attribute Developers	11	1868	November 20, 2019
Unable to set FIFO with Strict Ordering and Backfilling Users/Site Administrators	4	57	December 3, 2024
Replace starving with eligible_time Developers	6	840	January 18, 2021

Job_sort_formula: wrong terms

Related topics