Hi Guys,
Assuming the fairshare_entiry is Acount_String and the fairshare_usage_res is CPUT.
When there are multiple users submitting to the same fair share group, what would be the execution order for that group?
is it possible to have the scheduler split the CPUT equally between all users within the same fairsahre group, when there’s contention of course?
fairshare usage is computed based on the historical usage (e.g. cput ). The user who has the least usage of the fairshare will be run first.
Please check this attribute in the $PBS_HOME/sched_priv/sched_config
#
# unknown_shares
# The number of shares for the "unknown" group
#
# NO PRIME OPTION
#
# NOTE: To turn on fairshare and give everyone equal shares,
# Uncomment this line (and turn on fair_share above)
#unknown_shares: 10
You can set queue limits and server limits to overcome the maximum number of jobs a user can run at a time. Please check this section 5.15.1.6 Ways To Limit Resource Usage at Server and Queues in this guide https://www.altair.com/pdfs/pbsworks/PBSAdminGuide2020.1.pdf
Thanks adarsh, but using server or queue limits does not solve the problem, right?
Because even if you limit userA with a max_limit of 100 running jobs, when userB submits his/hers jobs then userB would have to wait for the entire 100 jobs of userA to complete - choosing too large max_limit will not have any effect and choosing a value that’s too small will cripple our queue utilization.
Are there any secondary execution order criteria that can help us? e.g. would setting job_sort_key to cput help?
Or maybe a solution that involves hooks?
What we end up doing today is creating placeholders of sub-groups for each fairshare group, e.g.
software-user1
software-user2
software-user3
software-user4
software-user5
And then when we submit jobs we choose the correct acount_string value based on some hash function over the username to “evenly” distribute the jobs between the different sub-groups, there has to be a more holistic solution for the problem, I wonder if anyone in this forum have faced the same issue?
What you really want is Account_Name:euser as the entity, but that unfortunately isn’t possible. What is possible is egroup:euser. This allows you to create fairshare groups to contain your entities, and the entities have a finer granularity than just the account string, but it is based on the primary linux group.
I do think job_sort_key gets involved if fairshare is equal. This means you can set a job_sort_key to differentiate between users of the same account string.
Lastly you can put fairshare in the job_sort_formula and use other parts of the formula to sort when fairshare is equal.
With the egroup:euser approach, I’m guessing that I can simply create the Linux groups and have everyone a member and then when the job is lunched i explicitly specify the relevant group?
Can you please provide an example of a job_sort_key that will provide priority for users with shorter CPUT within the fairshare group? And also can you please find out within the documentation of your assumption that job_sort_key is still viable when using fairshare?
Can you please provide an example of such formula?
Please check this section 4.9.21.7 Using Fairshare in the Formula
from this guide https://www.altair.com/pdfs/pbsworks/PBSAdminGuide2020.1.pdf
From this guide:
4.9.19.6.iii Computing Relative Usage (fairshare_factor)
An entity’s relative usage allows direct comparison between entities. Relative usage is fairshare_factor, and is a value between 0 and 1. A value of 0.5 means that an entity is using exactly its target usage. A higher value indicates less resource usage by the entity, meaning that the entity is more deserving. Calculated this way:
2^-(fairshare_tree_usage / entity’s fairshare_perc)
example of job sort formula : qmgr: set server job_sort_formula = adminboost * queue_priority+(eligible_time/3600)-(100*fairshare_factor)