I am submitting a number of jobs using for loop via python as:
for x in X:
for a in A:
for m in M:
sample = "-".join([m,a,x])
The problem is in our local cluster implementation, we don’t have a
nice, and jobs are submitted in first come first serve basis. But, at any time, if my usage goes beyond 20% of nodes, sysadmin can kill it. I understand that this is not an optimal implementation of
qsub, but this is of course beyond my control.
I currently solve this problem by limiting my jobs as:
numjobs = os.system("qselect |wc -l).read()
if numjobs <4:
Which is not very decent, but serves the purpose.
My question is, is there any way to tell
qsub how many jobs I can have at any given time so that I don’t need to check via
qselect on each loop?
@rudrab You can set a user limit for max_queued or max_run on server.
Thanks a lot for your reply.
The PBSpro admin guide says:
The maximum number of jobs that can be running.
The maximum number of jobs that can be queued and running. At the server level, this includes all jobs in the
complex. Queueing a job includes the qsub and qmove commands and the equivalent APIs.
I am not sure what will happen if my jobs hit
max_run, but as shown in my original post, I have more jobs to submit through the for loop. Will qsub wait patiently to submit the next job? or it will exit?
Answer is No , qsub does not know or have the blueprint of the existing status of the cluster, it is only a job submission client. The blueprint of the system is maintained by the server and using this blueprint scheduler decides where to schedule the job based on the scheduling policy/limits/sorting etc.
It seems, you want to have a automatic counter active measure (gaming the system) to submit jobs keeping in mind the policy set by your administrator of killing jobs if user jobs threshold is above 20%.
It is almost like writing a scheduler for a scheduler .
You can submit as many jobs as you like (via qsub) , number of jobs that would be running at a time would be equal to the value set in ‘max_run’ , if it is set to 3, only 3 jobs can run, no matter you have 1000’s of jobs in the queue ( even if the resources are available for them to run, due to max_run limit , only 3 or X jobs will be running)
qsub submits the job(s) , job(s) will be accepted by the server , they will be assigned a job id , but based on the limits, they will be put in the queue until they are eligible to run based on the limits set by the administrator.
qsub is a job submission client, it submits the job and would not wait for anything .
You can write a wrapper script (as you have it now) , which takes all the inputs from the user , creates a script, check whether you are below 20% threshold, if you are then submits the job, otherwise it will in a loop and check the eligibility.
I hope this helps