PP-773: Array jobs do not send emails when specified for -m abe

Hi,

This is a fix with a new interface. Please share your thoughts on this design document. As suggested in PP-773, the fix adds an option to enable sending emails for subjobs of arrayjob. The new option is ‘j’ like subJob :slight_smile: The ‘s’ and ‘a’ is already taken…

Vasek

Looks good. Do you think it will make it into the next release?

I’ll do my best. The patch for proposed design doc is almost ready. Of course, I am ready to modify it with respect to this topic.

EDD looks good in general. I have few quick comments.

  • Please update the EDD with error message as well when -m j option is used for non-array jobs.
  • also “The ‘j’ option can be combined with ‘abe’” should be little stronger as I assume if you submit only ‘-m j’ then it will throw error or is there any default that goes with -j?
  • EDD is silent about option ‘a’ (abortion of parent job array). are we not supporting that?

Just an easy one at the moment: In interface 1 it should be qsub, not qstat.

What if I supply “qsub -m jabe” to a non-array job? Is it an error, or will it simply be ignored?

Will Mail_Points include the “j” in the qstat -f display?

Related to interface 3, is the same true for the “a” and “b” options when “j” is not supplied?

@anamika , @scc , @mkaro Thank you for your comments. Please, find the addressed comments in the updated document. It should answer all your questions.

@anamika No new abort e-mail is added, I did not find an opportunity for this.

I have no further comments. Thank you @vchlum.

Thanks @vchlum. EDD looks good. one last question, since ‘-m ja’ support is not added will it throw error or just ignore the option.

@anamika The option is ignored. No error is shown. If there will be an abort event for the parent job in the future then this patch will not block it.

great … could you please mention this as well in the EDD? that would be all from my side.

@anamika Sorry for not being exact. To be clear…
The abort e-mail for parent array job is not supported yet. The mail option (qsub -J range -m a) is allowed and no error is shown.
But… Abort e-mails for subjobs of array job are supported (qsub -J range -m ja) but abort e-mails for subjobs are not sent by default (opposite to regular jobs). The ‘-ja’ mail option must be supplied for receiving subjob abort e-mails. The user is able to control all the subjob emails.
EDD has been updated.

Thanks for clarifying @vchlum. I sign off.
I found a small typo in " New mail option ‘j’ can by displayed"

Thanks @anamika, typo fixed.

It might be worth coordinating PP-773 and PP-479 as @subhasisb just wrote (in PP-479):

I personally like the idea of treating array subjobs as normal jobs; to that end, I wonder if it would be better to change behavior globally (but allow backward compatibility), e.g.,

    qmgr> set server enable_subjob_mail = true

(or “disable_subjob_mail” if we want to default to be “on”).

I like the idea to use ‘enable_subjob_mail’. It is clearer than the proposed solution. …but the advantage of the current EDD is that the user can decide with respect to the job array size. The user can request to send the e-mail for small array job and do not to send e-mails for a large array job. Not sure what is better.

Sounds like a reasonable justification for keeping a per-array-job option. I’m OK with it. I do wonder whether it’s necessary…

One path forward would be to start by implementing enable_subjob_mail. Then, after we get some real-world feedback and use, if there is a call for more control, add the per-array-job setting at that later time.

In any case, I’m OK either way and leave the decision up to you and the maintainers.

Thx!

Hi @vchlum

please take note on ongoing work regarding subjobs here https://pbspro.atlassian.net/wiki/spaces/PD/pages/161873965/External+interface+design+for+making+subjobs+on+par+with+regular+jobs+and+subjobs+surviving+server+restarts.

and code changes in progress here https://github.com/Shrini-h/pbspro/tree/PP-479

Also I suggest you share me any patch your working on for PP-773 for co-ordination

Thanks
Shrini

Hi @Shrini-h

I checked our patches and I do not see any conflict. Please, see the changes here: https://github.com/CESNET/pbspro/tree/PP-773 and let me know.

Vasek