Is it possible to configure fairshare with pre-emption, so that if a user is running a large number of jobs on the system those user jobs would be pre-empted (newest running job first) and put back onto the queue?
Thanks in advance for your advice.
Is it possible to configure fairshare with pre-emption, so that if a user is running a large number of jobs on the system those user jobs would be pre-empted (newest running job first) and put back onto the queue?
Thanks in advance for your advice.
I believe what you want to look at are soft limits. Those allow you to to say if users have too many jobs running, or by resource for finer granularity. To do this, you need to set them in preempt_prio, and set the max_run attribute on the queue or server. Look in the guides for soft limits.
The default ordering of jobs to preempt is ones most recently started.
So as a note, fairshare and preemption are two different functionalities of the scheduler. They can be turned on independently of each other. Fairshare orders job’s priority. Preemption says if a job can’t run now, let’s see if we can preempt lower priority jobs to allow us to run now.
Bhroam
Thanks for the reply. If I understand correctly, the soft limits would stop the scheduler from actually submitting jobs. What we’d like to achieve is that the user is able to use the resource on the chance that nobody else is going to use. If someone else then starts using, we would want only the jobs of the user(s) with the most current resource to start getting pre-empted and make available resource for the user who currently is using little resource.
Do you think something like this possible, or just a pipe dream?
I think you are looked at hard limits. A hard limit won’t be exceeded. A soft limit will be exceeded until a higher priority job comes along and needs those resources.
To do this, you need to set preempt_prio via qmgr to include server_softlimits. It should be added to the end of the line. You then set max_run_soft or max_run_soft.res on the server.
qmgr -c ‘set server max_run = [u:PBS_GENERIC=5]’
Will mean if a user runs more than 5 jobs, they are exceeded this limit. They can be preempted back down to 5 jobs at any time.
qmgr -c ‘ser server max_run_soft.ncpus = [u:PBS_GENERIC=10]’
Will mean a user that runs more than 10 cpus is over their limits.
The admin guide will go over this in more detail. Look for soft limits.
Bhroam
Ah yes, thanks for this. In the case were we may occasionally want a user to actually take all the resource, if they have a urgent agreed need. What would be the best way to deal with this. Would a very high priority queue be able to override the softlimits?
Thanks again
If a queue has priority >= 150 (default), it is considered an express queue. This will cause anything in this queue to preempt any normal or soft limit job. You can even have multiple layers of express queues, where higher priority express queues preempt from lower priority ones.
Bhroam
Chapter 4 of the PBS Professional Administrator’s Guide covers scheduling. See section 4.9.19, “Using Fairshare”, p. 140, and section 4.9.33, “Using Preemption”, p. 182.
Chapter 5 covers resource usage, including resource limits; see section 5.15, “Managing Resource Usage”, p. 290.