I want PBS to distribute compute nodes among all array jobs of one user in balanced manner, is it possible?
For example I have 12 equal compute nodes. User submits array job Job1 with 10k subjobs, each subjob uses one compute node, i.e. 12 subjobs are running at once. Then the same user submits another similar aray job Job2, I want Job2 to start as soon as possible and don’t wait for all subjobs in Job1 to finish. I.e. as soon as some subjob in Job1 is finished, PBS should start subjob from Job2 and eventually PBS should distribute 6 compute nodes to Job1 and 6 compute nodes to Job2. (50% fairshare).
Similarly, when same user submits third array job Job3, eventually PBS should distribute 4 compute nodes to each array job (33% fairshare), etc. For simplicity let’s assume that trere are no other users in that queue.
Please note user(s) request resources - ncpus, mem, walltime, gpus etc via qsub to submit their jobs and PBS Scheduler will make sure these jobs are scheduled on to the compute nodes that can satify the resource request. The fairshare policy is based on historical usage of the cluster by a user (or entity) - if a user has historically used the cluster for a lot of time, then the user has less chances of getting resource for his jobs when other users who hav less historical usage on the cluster, it is not about interprovisioning the cluster to maintain fairshare resource allocation to the user jobs, the scheduler is not aware of future job submission of the users, that it can foresee to take a decision now to segragate resources.
Please check the documentation on :
the user limits , project limits , group limits, they might help in limiting
queue job hooks might be helpful in re-writing the resource request statements after finding out the current scenario on cluster (or external method that records the fairshare usage based on your requirment and helps rewriting the select statement)
It’s possible to limit the number of subjobs of an array running at the same time.
-J <range> [%<max subjobs>]
Makes this job an array job. Sets job's array attribute to True.
Use the range argument to specify the indices of the subjobs of the array. range is specified in the form X-Y[:Z] where X is the first index, Y is the upper bound on the
indices, and Z is the stepping factor. For example, 2-7:2 will produce indices of 2, 4, and 6. If Z is not specified, it is taken to be 1. Indices must be greater than or
equal to zero.
Use the optional %max subjobs argument to set a limit on the number of subjobs that can be running at one time. This sets the value of the max_run_subjobs job attribute to the
specified maximum.
Otherwise, you’ll have to make the priorities of the jobs flip after a number of cycles. Which is possible in a formula and with hooks, but it all depends on exactly what you’re after (and why the user doesn’t want all his first job array subjobs to run first).
The only way to use fairshare for this is to change the fairshare entity to something else than the user (e.g. based on Account_Name, possibly set in a queuejob hook) so that both arrays can be in a different fairshare entity.