Execution order when multiple user submitting jobs to that same fairshare group

Hi Guys,
Assuming the fairshare_entiry is Acount_String and the fairshare_usage_res is CPUT.
When there are multiple users submitting to the same fair share group, what would be the execution order for that group?

is it possible to have the scheduler split the CPUT equally between all users within the same fairsahre group, when there’s contention of course?

Thanks,
Roy

fairshare usage is computed based on the historical usage (e.g. cput ). The user who has the least usage of the fairshare will be run first.

Please check this attribute in the $PBS_HOME/sched_priv/sched_config

#
# unknown_shares
#       The number of shares for the "unknown" group
#
#       NO PRIME OPTION
#
#       NOTE: To turn on fairshare and give everyone equal shares,
#             Uncomment this line (and turn on fair_share above)

#unknown_shares: 10

please let us know

Thanks adarsh, but I don’t think I was clear enough.

Fairshare entity is account_string and not euser.

Assume the following fairshare groups under root:

software 20
hardware 70

And the unknown_shares config is set to 10 shares.

Let’s assume that the queue is full of jobs with fairly distributed CPUT between hardware and software and there no queued jobs at time t=0

if user_a submitts 1,000 long running jobs at t=0 for the software fairshare group

And 30 minutes later user_b submits 5 jobs to the same software fairshare group.

Will user_b need to wait until all of user_a jobs to complete before user_b’s job will start running?

Thats correct

You can set queue limits and server limits to overcome the maximum number of jobs a user can run at a time. Please check this section 5.15.1.6 Ways To Limit Resource Usage at Server and Queues in this guide https://www.altair.com/pdfs/pbsworks/PBSAdminGuide2020.1.pdf

Thanks adarsh, but using server or queue limits does not solve the problem, right?
Because even if you limit userA with a max_limit of 100 running jobs, when userB submits his/hers jobs then userB would have to wait for the entire 100 jobs of userA to complete - choosing too large max_limit will not have any effect and choosing a value that’s too small will cripple our queue utilization.

Are there any secondary execution order criteria that can help us? e.g. would setting job_sort_key to cput help?
Or maybe a solution that involves hooks?

What we end up doing today is creating placeholders of sub-groups for each fairshare group, e.g.
software-user1
software-user2
software-user3
software-user4
software-user5

And then when we submit jobs we choose the correct acount_string value based on some hash function over the username to “evenly” distribute the jobs between the different sub-groups, there has to be a more holistic solution for the problem, I wonder if anyone in this forum have faced the same issue?

Hey,
There are a couple of options you can use.

What you really want is Account_Name:euser as the entity, but that unfortunately isn’t possible. What is possible is egroup:euser. This allows you to create fairshare groups to contain your entities, and the entities have a finer granularity than just the account string, but it is based on the primary linux group.

I do think job_sort_key gets involved if fairshare is equal. This means you can set a job_sort_key to differentiate between users of the same account string.

Lastly you can put fairshare in the job_sort_formula and use other parts of the formula to sort when fairshare is equal.

Bhroam

1 Like

Thanks bhoram, a few follow up questions:

  1. With the egroup:euser approach, I’m guessing that I can simply create the Linux groups and have everyone a member and then when the job is lunched i explicitly specify the relevant group?

  2. Can you please provide an example of a job_sort_key that will provide priority for users with shorter CPUT within the fairshare group? And also can you please find out within the documentation of your assumption that job_sort_key is still viable when using fairshare?

  3. Can you please provide an example of such formula?

Thanks,
Roy

Please check this section 4.9.21.7 Using Fairshare in the Formula
from this guide https://www.altair.com/pdfs/pbsworks/PBSAdminGuide2020.1.pdf
From this guide:
4.9.19.6.iii Computing Relative Usage (fairshare_factor)
An entity’s relative usage allows direct comparison between entities. Relative usage is fairshare_factor, and is a value between 0 and 1. A value of 0.5 means that an entity is using exactly its target usage. A higher value indicates less resource usage by the entity, meaning that the entity is more deserving. Calculated this way:
2^-(fairshare_tree_usage / entity’s fairshare_perc)

example of job sort formula : qmgr: set server job_sort_formula = adminboost * queue_priority+(eligible_time/3600)-(100*fairshare_factor)