Limit on maximum number of subjobs that can run at a time

Hi,

I’m opening this forum discussion to discuss about a design change proposed here.
The change is about introducing an option for users to specify maximum array subjobs that can be concurrently running at any given time.

Please review the same and provide comments.

Thanks,
Arun

If I submit an arrayjob with a limit, how could I alter the job to remove the limit?
Will it be:
qalter -Wmax_run_subjobs=0 12[].server1

That would make the limit set to 0 and no subjobs would be considered to run. To remove a limit in order to consider all jobs, you can set max_run_subjobs to the number of subjobs in the array.

Hey @arungrover,
I just have a few comments:

  1. I’d explicitly say the %num comes at the end. You wouldn’t want someone to get confused and think they could do -J1-4%2,10-20%4
  2. Consider not bothering adding the error message if both %num and -Wmax_run_subjobs are both given. Just say which one takes precedence.
  3. I don’t think you can modify this in a runjob hook, since the runjob hook will be run on the subjob
  4. Instead of setting a default of all the subjobs, just say if this attribute is not set, it will be backwards compatible with no limit. That way if someone wants to remove the limit with qalter, they don’t have to count up to total number of subjobs. They can just unset the attribute.
  5. Just to mention a possible hiccup, I know that a qsub hook happens early in the jobs life. The max_run_subjobs might not be set by that time. If you set it in a qsub hook, it might clash (especially if you keep it as an error)
  6. There is nothing you can really do about this, but if the max_run_subjobs attribute changes after the job is submitted, the Submit_Arguments will still have -J %num in it where %num is wrong. This isn’t anything new though.

Thanks for reviewing the document Bhroam.

Is it possible to specify -J option in comma-separated way like you described? I tried it but didn’t work for me. I was thinking to enforce user to do the right thing and now specify the attribute value in two different ways and that is why I didn’t do precedence. There really is no reason for users to specify it in two different ways.

You are right, I’ll make this change.

We cannot really unset a job attribute right? We can alter it, but I’ve not seen any interface to unset it. Am I missing something?

So I think queuejob hooks run before the job is enqueued but it still has the job object with user data in it. Otherwise, how will admins do some validity checks on the job and reject/accept or change it.

You are right, this problem will be there no matter what I do.

No its not supported, comma separated value of ranges is only supported in ‘remaining_indices’ not in ‘submitted_indices’ (aka -J in qsub) also qlater doesn’t support -J

Thanks @arungrover! To add a very important use case to what you have listed in the design from the admin’s perspective:

A user may have a job array in which all subjobs will interface with a single instance of a shared data file, and as more and more subjobs run simultaneously the job performance sharply degrades. Further, different applications/or runs of the same application may have different impacts on the shared resource, so we have had requests for the ability to limit this number at the user’s request per job array.

Your current design covers this, I just want to make sure it is explicitly stated as a use case.

Thanks @scc I’ll mention this as the use case in the document. @scc I have a question about suspended subjobs. I think suspended subjobs should also be considered as running by PBS scheduler but I want to know your opinion on it from a use case perspective.

@bhroam and I had an offline chat and Bhroam said that job attributes can be unset using qalter -W<attr_name>=""
With this in mind, I think Bhroam’s comment is valid and I’ll make the change to say that if “% or max_run_subjobs” is not provided the behavior will remain same as it is today.

I believe suspended subjobs should NOT count as running for this purpose. This fits in well with the current stated use cases and with how queue/server level limits like max_run work as well.

Does that mean suspended subjobs will not resume unless they are below max_run_subjobs limit?

Yes, that seems correct to me.

Thanks @scc
There seems to be a complexity with preemption. I was not planning on trying preemption for the array that couldn’t run because of this limit. Because the only job it can possibly consider to preempt is its own subjobs which are also running at the same preemption priority level.

But if a user does qrun of a subjob then I’m not sure what to do. We usually ignore all limits and go ahead with preemption in those cases. From a use case perspective should we allow that?

What about eligibile time? The general rule of thumb is if you are getting in your own way, you don’t accrue eligible time. That is usually because of a system wide limit. Here you are indeed getting in your own way, but I look at this feature as a voluntary limit to how many of your subjobs can run at one time. It doesn’t seem right for you to your place in line because you are doing this.

Bhroam

For normal preemption, we do not ignore limits. For qrun we do. For normal preemption, you won’t be able to preempt anything to run another subjob. All your subjobs will have the same preemption priority, so you can’t find anything to preempt. For a qrun preemption you have higher priority. Of course we ignore max_run limits with qrun, so we probably should do the same here. If an admin is telling us to run this job, we should bypass the limit and run it.

Bhroam

Agree on the limits/qrun discussion, thanks @bhroam.

I think the array should accrue eligible time if subjobs are not running due to this new limit. In the most common use case I believe these limits will be self applied, and I don’t believe the user should be punished for applying them by not accruing eligible time.

Thanks @bhroam and @scc I have modified the document. Please have a look.

Thanks for making the changes @arungrover.
I just have one comment. Make note of the discussion about qrun overriding the limit.

Bhroam

Thanks for the updates @arungrover. The current qrun bullet says:

  • If a user issues ‘qrun’ on an array job, PBS scheduler will try to run the job even if the array is hitting its max_run_subjobs limit.

Shouldn’t that say “admin” rather than user, and “subjob” rather than “job”?

Thanks for catching that @scc. I’ve updated the document.
I’ve also changed the message scheduler logs when it hits the limit and a new error that gets printed when this new attribute is used for non-array jobs.