New sched attribute to control runjob wait + making pbs_asynrunjob truly async + deprecating 'throughput_mode'

bhroam · April 8, 2020, 6:53pm

Hey Ravi,
Thanks for making the changes. I have a few suggestions

Add the o into job for the options. There is no real reason to shorten it by one character
Remove the ‘much’ from much slower. It is only much slower in certain cases like running 50k jobs in one cycle. If you run a few jobs in a cycle, it doesn’t affect things much.
The reason I suggested having one section that talked about what happened when the scheduler didn’t wait for a hook was it is common. You don’t have to say the same thing over again for each option. You can also say ‘the resources are considered used for that cycle. They will appear free again in the next cycle’. You can also say the less hooks we wait for, the faster it is,
I’d reorder the options from slowest to fastest. This will require some rephrasing some of your sections because you talk about how it is slower than above.
For execjob_hook, say you wait for both the runjob_hook and the execjob_begin hook. It isn’t all of the execjob hooks. For instance, if you reject the job in the execjob_launch hook, the scheduler won’t know about the reject until the next cycle.
You still talk about throughput_mode in the internal section.

Bhroam

agrawalravi90 · April 9, 2020, 12:25pm

Sorry about my now deleted post, I had done an error in measuring the numbers. I tried running 1k jobs instead of 50k, and truely async was still around 15-16 times faster than throughput_mode=high, even though the numbers were much smaller (2 seconds vs 0.122 seconds instead of 180 seconds vs 12 seconds). I still think that we should emphasize that the difference can be pretty big and leave the wording as it is. Let me know if you still disagree.

bhroam · April 9, 2020, 12:37pm

Hey @agrawalravi90,
My point wasn’t that it is not faster. My point was that if my cycle is 2s or .122s, it doesn’t matter much. 2s is not a very long time. If I am running 50k jobs, 3m vs 12s is a big difference. I just thought it was a little misleading. It doesn’t really matter.

My only comment for the current document is that I wouldn’t call the section at the top caveats. Caveats go at the bottom after you have explained things. It’s to explain what is different from the general case. I still think that information should go at the top, but not in the form of caveats.

Bhroam

agrawalravi90 · April 9, 2020, 1:10pm

Thanks Bhroam, I do see your point, but I also think that it’s important to emphasize the performance boost otherwise the trade-off might not seem worth it. So, I’ve instead mentioned that it can be up to 15 times faster, is that better? worse?

agrawalravi90 · April 14, 2020, 1:22am

PR: https://github.com/PBSPro/pbspro/pull/1653

scc · April 14, 2020, 12:59pm

Sorry for the late design feedback, but what about renaming “runjob_wait” to “job_run_wait”? It is confusing to see the “runjob” sub-string in both the attribute name and in one of the accepted values (not to mention that “runjob” means something specific to PBS, “job_run” does not). One might look at the current design and expect there to exist an “execjob_wait” attribute as well.

agrawalravi90 · April 14, 2020, 8:54pm

Thanks @scc, I’ve renamed the attribute to job_run_wait.

Topic		Replies	Views
Scheduler can spend 94% of its time waiting for job run ACK Developers	10	1506	March 26, 2020
Throttling job attribute updates from scheduler to server Developers	56	2395	April 21, 2020
[WIP] "Mock run" option for scheduler Developers	6	1210	November 17, 2020
Pre-sched hook event in server Developers	5	731	December 10, 2018
Schedulers doesn't seem to be holding jobs Users/Site Administrators	11	1624	June 18, 2019

New sched attribute to control runjob wait + making pbs_asynrunjob truly async + deprecating 'throughput_mode'

Related topics