[WIP] "Mock run" option for scheduler

agrawalravi90 · November 13, 2020, 4:27am

Hi,

I’m proposing a new option to pbs_sched called “mock run”. This will make it very lean and is intended to be used when testing performance issues in other parts of PBS by taking scheduler out of the critical path.

Here’s the design: https://openpbs.atlassian.net/wiki/spaces/PD/pages/2257944577/Mock+Run+option+for+Scheduler

Please let me know what you think.

Thanks,
Ravi

hirenvadalia · November 13, 2020, 5:07am

Hey @agrawalravi90 just a idea:
how about we make -m take comma separated values which will specify what to enable, like -m limits,placementset then it will enable path of limits and placement sets… if -m none then default (no policy)…

This will allow us to test and find more bottlenecks with different policy.

agrawalravi90 · November 13, 2020, 6:27am

Thanks for the suggestion Hiren, but I think most of scheduler’s regular features don’t apply to it: placement sets don’t make sense for single cpu jobs, preemption needs job priorities. Implementing limits will be heavy-weight enough that I think at at that point it’s better to just use the regular scheduler and turn on/off features in it instead. So, I think it’s not worth the effort.

ashwathraop · November 13, 2020, 3:35pm

The scheduler will still do a few basic things so IMO wouldn’t it be better to call it minimal scheduling or something similar rather than mock run?

Another idea, maybe instead of -m, how about a variable inside sched config? This way admin doesn’t have to start sched separately and can still rely on init scripts.

agrawalravi90 · November 13, 2020, 6:04pm

Thanks for your suggestions Ashwath.

TBH, I don’t really see this mode being used by admins for doing real scheduling, so I made it consistent with pbs_mom’s mock run option. Calling it a ‘lean’ mode will also encourage adding more features to this code path, which would defeat the main purpose of it, which is to remove scheduler from the critical path and focus testing performance of the server (or mom). So, I think if we really want to create a ‘lean’ scheduler mode, then it should be a separate option in addition to mock run. Maybe something we can take up later if we find that there actually are customers who can use this for real scheduling?

About sched config, I hadn’t thought about that. But, I think we should actually make it more difficult for admins to switch to and from this mode as this is a potentially dangerous option, but I’m open to changing it if others feel differently. Let’s wait for other thoughts on this.

bhroam · November 16, 2020, 11:42pm

IMHO, by making it this “lean” all you have done is make it a mode which helps you right now since you are concentrating on the HTC environment. If you want to make a leaner scheduler path, that is certainly possible. It might actually be a useful scheduling path FOR the HTC environment. The main slowness that you are concerned is node search (and maybe calendaring). Right node in check_nodes(), there are 2 paths. The bucket path, and the much more comprehensive but slower normal node path. I say write a third path.
Do the following:

Only allow a non-plussed spec (pluses add more complexity).
ignore the place directive
ignore placement sets

This way you write a super lean loop that just calls eval_simple_selspec() N times for N chunks.

Maybe also have a check that turns off calendaring completely. If strict_ordering is false, don’t create or add anything to the calendar. There is an O(N) loop in run_update_resresv() which adds the end event to the calendar.

Let the rest of the scheduler do its thing. It isn’t as lean as you are suggesting, but it is probably lean enough.

If you really do want something like you are suggesting, don’t modify the scheduler. PBS was written with the idea of pluggable schedulers. Just write a new binary and link libsched.a for what you need of it.

Bhroam

agrawalravi90 · November 17, 2020, 1:35am

Thanks for the inputs Bhroam. Ok, I think i see your point, I want to do this to test out the scalability and performance of multi-server, for single cpu jobs without any special policies in place, so it’s not generic.

I’d like to try out your suggestion to create a leaner path, but I think the bottlenecks that you pointed out didn’t come into the picture during my tests because I submitted only single chunk jobs, without any placement sets, the place directive was not used, and calendaring was turned off. So, I’m not sure if it’ll improve the performance of sched for my test case. But please let me know if I understood something incorrectly.

I’m ok creating a new binary, so I think I’ll start working on that. I don’t think we even need a design doc for this.

Topic		Replies	Views
Add a mock run option to pbs_mom for testing Developers	16	773	March 3, 2020
Job Submission by Memory Users/Site Administrators	5	1365	February 9, 2018
Pre-sched hook event in server Developers	5	731	December 10, 2018
PP-506,PP-507: Add support for requesting resources with logical 'or' and conditional operators Developers	63	7297	May 23, 2017
Reservations in PBS pro (vs other schedulers) Users/Site Administrators	4	4061	September 22, 2016

[WIP] "Mock run" option for scheduler

Related topics