Fairshare by queue

Dear All,

PBS has configured with by_queue and round robin disabled. Users jobs queued as FIFO. But we want to have fairshare policy with the queue so that resources shared. somehow fairshare configuration not working for us. Any suggestions ?

Steps:
Enabled below into configuration file /opt/pbs/etc/pbs_sched_config

fairshare: true all
unknown_shares: 10
fairshare_usage_res: ncpus*walltime
fairshare_entity: euser
fairshare_decay_time: 06:00:00
fairshare_decay_factor: 0.7

restarted services pbs stop and start" after changes to pbs_sched_config

However, donā€™t see any effect of the fairshare.

more details:
[root@hpc etc]# pbsfs
Creating usage database for fairshare.
Fairshare usage units are in: cput
TREEROOT : Grp: -1 cgrp: 0 Shares: -1 Usage: 1 Perc: 100.000%
unknown : Grp: 0 cgrp: 1 Shares: 0 Usage: 1 Perc: 0.000%
[root@hpc etc]# rpm -qa | grep pbs
pbspro-server-19.1.1-0.x86_64
[root@hpc etc]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.4 (Maipo)

It looks like you modified the wrong file. You need to change the sched_config file in pbs home. The file in pbs exec is provided as a reference copy of the default sched_config file. Edit /var/spool/pbs/sched_priv/sched_config and make the same changes you made to the file in pbs exec.

Bhroam

Now, we did change into the correct file ā€œsched_configā€. Below changes seems to be effective. however, how do we test if fairshare policy is working fine?

Fairshare usage units are in: ncpus*walltime
TREEROOT : Grp: -1 cgrp: 0 Shares: -1 Usage: 1 Perc: 100.000%
unknown : Grp: 0 cgrp: 1 Shares: 10 Usage: 1 Perc: 100.000%

Thanks,
Anil

Please submit X number of jobs as User1 , let them run and finish.
Set the scheduling to false : qmgr -c ā€œset server scheduling = falseā€
Submit another X number of jobs as User1
Submit another Y number of jobs as User2
Set the scheduling to true : qmgr -c ā€œset server scheduling = trueā€
Check the output of the pbsfs command and qstat -answ1

You have two choices now. You can populate your resource_group file with all your users (and possibly subdivide them into fairshare groups) or you can turn fairshare_enforce_no_shares to false. By default, the scheduler will not run any jobs that are not in the resource_group file.

Bhroam

As per the option 1, we created the resource group.
Pasting few lines from the command outputs.
What we found is that usersā€™ jobs in the queue are not sorted as per cput utilization. Instead its still FIFO only. How to fix it?

resource_group:
ambreesh.khurana 1 root 10
anil 1 root 10

pbsfs
anil : Grp: 0 cgrp: 1 Shares: 10 Usage: 1 Perc: 2.128%
ambreesh.khurana: Grp: 0 cgrp: 1 Shares: 10 Usage: 1 Perc: 2.128%
unknown : Grp: 0 cgrp: 1 Shares: 10 Usage: 1 Perc: 2.128%

qstat:

Job id Name User Time Use S Queue


39826 sigbin ambreesh.khurana 709:57:1 R small
39827 sigbin ambreesh.khurana 678:39:4 R small
39828 sigbin ambreesh.khurana 622:23:1 R small
39829 sigbin ambreesh.khurana 678:43:5 R small
39830 sigbin ambreesh.khurana 571:26:3 R small
39831 sigbin ambreesh.khurana 603:58:3 R long
39832 sigbin ambreesh.khurana 579:41:1 R long
39833 sigbin ambreesh.khurana 563:55:1 R long

We configured a test queue and try test the fairshare:

pdfs update the usage file and now we could able to see the usage. but still not sure why the queue preference is same and working as FIFO.

anilkumar : Grp: 0 cgrp: 63 Shares: 10 Usage: 1071 Perc: 2.564%
anil : Grp: 0 cgrp: 62 Shares: 10 Usage: 1 Perc: 2.564%
ambreesh.khurana: Grp: 0 cgrp: 61 Shares: 10 Usage: 435041 Perc: 2.564%

Queue: test
queue_type = Execution
Priority = 100
total_jobs = 0
state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0 Begun
:0
max_queued_res.ncpus = [u:PBS_GENERIC=64]
resources_max.nodect = 2
resources_max.walltime = 01:00:00
resources_default.walltime = 01:00:00
resources_assigned.ncpus = 0
resources_assigned.nodect = 0
max_run = [o:PBS_ALL=10]
max_run_res.ncpus = [u:PBS_GENERIC=64]
enabled = True
started = True

As per the option 1, we created the resource group.

Pasting few lines from the command outputs.

What we found is that usersā€™ jobs in the queue are not sorted as per cput utilization. Instead its still FIFO only. How to fix it?

resource_group:
ambreesh.khurana 1 root 10
anil 1 root 10

pbsfs
anil : Grp: 0 cgrp: 1 Shares: 10 Usage: 1 Perc: 2.128%
ambreesh.khurana: Grp: 0 cgrp: 1 Shares: 10 Usage: 1 Perc: 2.128%
unknown : Grp: 0 cgrp: 1 Shares: 10 Usage: 1 Perc: 2.128%

qstat:

Job id Name User Time Use S Queue


39826 sigbin ambreesh.khurana 709:57:1 R small
39827 sigbin ambreesh.khurana 678:39:4 R small
39828 sigbin ambreesh.khurana 622:23:1 R small
39829 sigbin ambreesh.khurana 678:43:5 R small
39830 sigbin ambreesh.khurana 571:26:3 R small
39831 sigbin ambreesh.khurana 603:58:3 R long
39832 sigbin ambreesh.khurana 579:41:1 R long
39833 sigbin ambreesh.khurana 563:55:1 R long

We performed the testing as per the suggested method. We found that queue is working in FIFO only.

User1 submitted many jobs to queue. Few jobs of User1 were in the queue and remaining running. After some time USer2 submitted few jobs and now all of Users2 jobs queued.

Once User1 jobs completed, one followed by another User1 jobs only executed. Finally after all user1 jobs got over then only User2ā€™s jobs started running.

Expected order was let the USer1 jobs completed but next jobs should run from USer2 as his jobs were in waiting.

any suggestions?

Do you have a job sort formula? That overrides fairshare.

Bhroam

We donā€™t have a job sort formula.

Now what we can see is each usersā€™ usage value getting updated on scheduler intervals. Is that now job sorting based on usage rather on percentage.

For example:

Who will get the first chance to run their job?

User1: usage:10000 perc:50%

User2: usage:1 perc:50%

What we observed is, user1 jobs get started first if he submits first. Ideally it should be USer2 who should get the chance?

When fairshare is enabled, PBS Pro will consider usage of the system with respect to the entity , the next entity to run the job would have used the cluster resources less in comparison to others.
user1 = usage is 75
user2 = usage is 40
user3 = usage is 30

user3 will get a chance first, than user2 and user1

Thanks a lot for the clarification.

Dear All Here,

Thank you for all your inputs regarding Fairshare,

iā€™m also trying to configure fairshare but not seems usage on fairshare basis.

Below setting in config files,

#sched_config

fairshare: true all
unknown_shares: 10
fairshare_usage_res: ncpus*walltime
fairshare_entity: euser
fairshare_decay_time: 06:00:00
fairshare_decay_factor: 0.7

#resource_group

rajiv 10 root 10
clusterhri 20 root 20

#pbsfs output

Fairshare usage units are in: ncpus*walltime
TREEROOT : Grp: -1 cgrp: 0 Shares: -1 Usage: 1 Perc: 100.000%
clusterhri: Grp: 0 cgrp: 20 Shares: 20 Usage: 0 Perc: 50.000%
rajiv : Grp: 0 cgrp: 10 Shares: 10 Usage: 0 Perc: 25.000%
unknown : Grp: 0 cgrp: 1 Shares: 10 Usage: 1 Perc: 25.000%

pbsfs -g rajiv

fairshare entity: rajiv
Resgroup : 0
cresgroup : 10
Shares : 10
Percentage : 25.000000%
fairshare_tree_usage : 0.000000
usage : 0 (ncpus*walltime)
usage/perc : 0
Path from root:
TREEROOT : 0 1 / 1.000 = 1
rajiv : 10 0 / 0.250 = 0

Here, no change in fairshare_tree_usage : 0.000000 OR usage : 0 (ncpus*walltime), after completing jobs, also not happening fairshare usage.

Could you please guide me know whatā€™s wrong here.

Thanking you,
Amol Thute

First question: did you HUP or restart the scheduler after setting up fairshare? The config file is not reread until so.

The other thing is you shouldnā€™t be seeing usages of 0 unless you specifically set it so. The scheduler will set the usage to 1 by default. Did you do a pbsfs -s on the entities to 0?

Bhroam

Yes, I restarted the scheduler after setting up fairshare.

Can you confirm where to specify usagres 1 or 0, i donā€™t have specifically set it.
Did you do a pbsfs -s on the entities to 0? - No

Hi Bhroam,

Any clue here, how I can fix this issue.
Let me know in case any further output require of my configure.

Awaiting for your inputsā€¦

Thank you,
Amol Thute

Hey @amolthute
Sorry for the delayed response. I was on vacation.

For the usage being 0, Iā€™m just surprised. The default value of any entity is 1. You can see this with the unknown group. Its usage is 1. The usage of the two users you have are 0. Usually the only way that can happen is if someone specifically sets it with pbsfs. Can you check if /var/spool/pbs/sched_priv/usage.bak is there? That means someone has run pbsfs -s before. Maybe another admin might have run it?

Your setup does look like it is properly set up. When you start the scheduler, does the log report any errors when parsing the config file?

Just a random question (this has come up before): are you modifying /var/spool/pbs/sched_priv/sched_config and not /opt/pbs/etc/pbs_sched_config? The latter is just a reference copy to the default sched_config.

Are you in a multi-sched environment? If so, the value of sched_priv might not be properly set, so the sched_config file you are modifying is not being used.

Bhroam

Appreciate your support bhroamā€¦ Thanks!

/var/spool/pbs/sched_priv/usage.bak is there? - Yes, its there.

[root@c7pc00 sched_priv]# pwd
/var/spool/pbs/sched_priv
[root@c7pc00 sched_priv]#
[root@c7pc00 sched_priv]# ll
total 56
-rw-rā€“r-- 1 root root 1665 Jan 11 20:20 dedicated_time
-rw-rā€“r-- 1 root root 2537 Jan 11 20:20 holidays
-rw-rā€“r-- 1 root root 1829 Feb 11 11:13 resource_group
-rw-rā€“r-- 1 root root 158 Feb 11 16:24 sched_config
-rw-rā€“r-- 1 root root 16521 Feb 11 15:46 sched_config.bkp
-rw-rā€“r-- 1 root root 46 Feb 13 13:00 sched_formula
-rw-rā€“r-- 1 root root 6 Feb 13 13:00 sched.lock
-rw-rā€“r-- 1 root root 441 Jan 30 16:50 sched_out
-rw-rā€“r-- 1 root root 160 Feb 21 06:30 usage
-rw-rā€“r-- 1 root root 160 Feb 11 11:33 usage.bak
[root@c7pc00 sched_priv]#

I just saw in the man page of pbsfs, so here shall i set usage_value to 1 and check?
#pbsfs -s entity usage_value

Are you modifying /var/spool/pbs/sched_priv/sched_config - Yes, i modified same file, Not /opt/pbs/etc/pbs_sched_config file.

Are you in a multi-sched environment? - No

Thanks,
Amol Thute

Now weā€™re grasping at straws here. Everything looks like it is set up properly.

The only two things which can perturb the fairshare order is preemption and starving jobs. Are you using preemption? Are any of your out-of-order jobs starving? Do you have a job_sort_formula?

Letā€™s check if it is set up properly, but you are expecting different behavior. If that is the case, we can modify the configuration to fit your expectations.

Do the following:

  1. turn scheduling off: qmgr -c ā€˜s s scheduling=fā€™
  2. submit your jobs
  3. turn scheduling back on: qmgr -c ā€˜s s scheduling=tā€™
    Look at the logs for the cycle which was just run. The fairshare order is the order the lines with ā€˜Considering job to runā€™. Is that order what you expect?

The order of the schedulerā€™s sort is the following:

  1. Is a job in a reservation?
  2. The preemption priority of a job (i.e. express queue jobs vs normal jobs)
  3. Has a job been preempted?
  4. Is a job starving?
  5. Is there a job_sort_formula?
  6. Fairshare

If a job fits into 1-5, it will be out of order with fairshare.

Bhroam

1 Like