Throttling job attribute updates from scheduler to server

scc · April 15, 2020, 6:52pm

Sounds great @agrawalravi90, thanks! 2 comments:

The notrun_update_freq name is awkward due to missing underscore between the first 2 words. How about not_run_update_freq, or not_run_comment_update_freq (possibly too long, yes, but more descriptive).
[minor] We should not use “wanna” in a design doc . Please change from:

For sites which see a large volume of jobs, admins might wanna increase this.

to

For sites which see a large volume of jobs, admins might want to increase this to tune performance.

agrawalravi90 · April 15, 2020, 7:52pm

Thanks @scc, I’ve renamed it to “not_run_update_freq” … I still don’t like the name though … can we just call it “attr_update_freq”? or will that be confusing?

agrawalravi90 · April 15, 2020, 7:53pm

It can’t have “comment” in the name because it’s not just comment, it’s also a couple of other attributes

bhroam · April 15, 2020, 10:42pm

I like attr_update_freq. It is a little more confusing, but this entire RFE is on the technical side. We wouldn’t want someone setting it unless they read the guides and knew what they were doing.

Bhroam

agrawalravi90 · April 15, 2020, 10:48pm

Good point, I’ll change it to that, thanks.

agrawalravi90 · April 16, 2020, 4:53am

@bhroam and @scc, @subhasisb asked me if we could do the throttling by time instead of sched cycles, so I thought of exploring that a bit:
Scheduler will send updates every N seconds since it’s last sent it. The advantage is that admins might have more control over when attribute updates happen if we measure it in time vs sched cycles. A couple of ways to implement this:

At the end of each cycle, scheduler checks whether it’s been N seconds since last time it sent updates to decide. This is very simple, but has the disadvantage that some updates might get delayed by more than N seconds as it depends on how long the scheduling cycle took, and how often the sched cycles happen.
Interrupt scheduler after every N seconds via SIGALRM, scheduler sends out all pending updates regardless of whether a sched cycle is happening or not. The upside is that updates will go exactly after every N seconds, but this will require scheduler to cache the updates across cycles.

what do you guys think? Either of these better or worse than the num cycles approach? Any other way we could do the time based throttling?

bhroam · April 16, 2020, 6:52pm

If we take a timed based approach, it should still be based on cycles. If a cycle starts after N seconds, update for that cycle. Using SIGALRM is right out. We used to use it for the scheduler’s max cycle time, and got into trouble. It is possible to break a IFL call in half and then before that broken IFL call completes, do more IFL calls. This causes issues. We moved to a timed based approach. We check after we look at each job if we’ve run passed our cycle length, and then end.

I don’t think keeping track of the updates through the cycle is useful because we might just dump them at the end if we didn’t reach our time. We’d also run into issues with our max cycle length. If we reach our time limit right as we reach our max cycle length, we either don’t update, or we start updating and run past our max cycle length.

So what is the downside to doing it per-cycle? Is it that we worry that cycles will happen far apart? If that is the case, the admin already has a time based control on that. Right now if no event triggers a cycle, one happens every 10m. The admin can shorten that. If they are worried that updates won’t be sent fast enough, they can set the throttle to something lower, or the max time between cycles shorter, or both.

I also think this being timed based gives the admin a larger chance to shoot themselves in the foot. In your time experiments, you found the difference between cycle times was 16s vs 2m. If they set this too short, they’ll just be interrupting every cycle every in the middle to send updates.

So if we want to make this timed based, I’d say we check at the start of a cycle and either have that cycle update or not. We don’t keep track of updates and then dump them every N seconds which might happen in the middle of a cycle.

Bhroam

agrawalravi90 · April 16, 2020, 7:11pm

Thanks for your inputs Bhroam. @subhasisb what do you think? I kind of feel that there’s enough uncertainty either way, unless we do the SIGALRM approach, which sounds like it might be error prone. I’m kind of leaning towards the cycles approach just because it might more straight-forward, updates will always go after every N cycles, better than having to explain why the updates didn’t go on time with the time based approach.

billnitzberg · April 20, 2020, 5:12pm

I like the time-based/cycle-based approach you suggested:

“At the end of each cycle, scheduler checks whether it’s been N seconds since last time it sent updates to decide.”

It’s not that the scheduler guarantees updates every N seconds – it’s that it doesn’t bother updating for at least N seconds. (It’s not that much different than the current approach… you don’t get updates until … you get them … which can be a long time if the cycle takes a long time.)

agrawalravi90 · April 20, 2020, 5:25pm

Thanks @billnitzberg, do you like it better than a purely cycles based approach, where sched sends updates every N cycles?

billnitzberg · April 20, 2020, 7:49pm

Yes. (I prefer to think of cycles as purely an implementation detail…something that’s better hidden than exposed… as much as possible.)

agrawalravi90 · April 20, 2020, 8:46pm

Ok, thanks for confirming! One last thing, do you have a preference for whether we decide at the start of the cycle that we’ll send the updates out that cycle, or decide this at the end of the cycle?

billnitzberg · April 20, 2020, 11:13pm

I feel that, since you are making PBS really fast, it does not matter :-).

agrawalravi90 · April 21, 2020, 12:42am

Thanks Bill!

@All, I’ve made changes to the design document (https://pbspro.atlassian.net/wiki/spaces/PBSPro/pages/1684045848/New+Sched+attribute+to+throttle+job+attribute+updates) to mention time period instead of cycles. Please review it and let me know if it looks okay. Thanks!

subhasisb · April 21, 2020, 4:30am

I also like the cycle-end deduction of time to update. Thanks for updating the design.

scc · April 21, 2020, 3:07pm

Thanks is fine with me, thanks!

agrawalravi90 · April 21, 2020, 6:05pm

Thanks subhasis. About cycle-end deduction vs deciding at the beginning of the cycle, does it really matter? Deciding at the beginning will be much easier to implement, so I was going to do that. This feature is really for busy sites where sched cycles probably happen one after another, so doing it at the end of previous cycle vs the next cycle, I don’t think it will make much difference. Please let me know if you still feel strongly about doing it at the end.

Topic		Replies	Views
New sched attribute to control runjob wait + making pbs_asynrunjob truly async + deprecating 'throughput_mode' Developers	26	1641	April 14, 2020
PP-482: Non-destructive walltime Developers	39	3678	October 20, 2017
Scheduler can spend 94% of its time waiting for job run ACK Developers	10	1498	March 26, 2020
New job attribute to measure a job's total wait time Developers	4	986	December 4, 2018
PP-480: Job Equivalence Class Optimization Developers	25	3025	August 22, 2017

Throttling job attribute updates from scheduler to server

Related topics