New sched attribute to control runjob wait + making pbs_asynrunjob truly async + deprecating 'throughput_mode'

Hey Ravi,
Thanks for making the changes. I have a few suggestions

  1. Add the o into job for the options. There is no real reason to shorten it by one character
  2. Remove the ‘much’ from much slower. It is only much slower in certain cases like running 50k jobs in one cycle. If you run a few jobs in a cycle, it doesn’t affect things much.
  3. The reason I suggested having one section that talked about what happened when the scheduler didn’t wait for a hook was it is common. You don’t have to say the same thing over again for each option. You can also say ‘the resources are considered used for that cycle. They will appear free again in the next cycle’. You can also say the less hooks we wait for, the faster it is,
  4. I’d reorder the options from slowest to fastest. This will require some rephrasing some of your sections because you talk about how it is slower than above.
  5. For execjob_hook, say you wait for both the runjob_hook and the execjob_begin hook. It isn’t all of the execjob hooks. For instance, if you reject the job in the execjob_launch hook, the scheduler won’t know about the reject until the next cycle.
  6. You still talk about throughput_mode in the internal section.

Bhroam

Sorry about my now deleted post, I had done an error in measuring the numbers. I tried running 1k jobs instead of 50k, and truely async was still around 15-16 times faster than throughput_mode=high, even though the numbers were much smaller (2 seconds vs 0.122 seconds instead of 180 seconds vs 12 seconds). I still think that we should emphasize that the difference can be pretty big and leave the wording as it is. Let me know if you still disagree.

Hey @agrawalravi90,
My point wasn’t that it is not faster. My point was that if my cycle is 2s or .122s, it doesn’t matter much. 2s is not a very long time. If I am running 50k jobs, 3m vs 12s is a big difference. I just thought it was a little misleading. It doesn’t really matter.

My only comment for the current document is that I wouldn’t call the section at the top caveats. Caveats go at the bottom after you have explained things. It’s to explain what is different from the general case. I still think that information should go at the top, but not in the form of caveats.

Bhroam

Thanks Bhroam, I do see your point, but I also think that it’s important to emphasize the performance boost otherwise the trade-off might not seem worth it. So, I’ve instead mentioned that it can be up to 15 times faster, is that better? worse?

PR: https://github.com/PBSPro/pbspro/pull/1653

Sorry for the late design feedback, but what about renaming “runjob_wait” to “job_run_wait”? It is confusing to see the “runjob” sub-string in both the attribute name and in one of the accepted values (not to mention that “runjob” means something specific to PBS, “job_run” does not). One might look at the current design and expect there to exist an “execjob_wait” attribute as well.

Thanks @scc, I’ve renamed the attribute to job_run_wait.