Created node bucket messages in sched_logs

bhroam · August 11, 2021, 8:25pm

I don’t think changing will provisioning_priority will make a difference. This is just a way to turn off the node bucket node search algorithm. That can only help speed things up. The node bucket algorithm kicks in if you request your nodes with -lplace=excl. I’d revert that change.

Do you have strict_ordering on? Having backfill_depth=0 will cause your system to idle. If you don’t have strict_ordering on, it won’t make a difference until jobs start starving (wait time > 1day). After that, it’ll start idling your system waiting for the starving jobs to run. Regardless of that, if it kicks in and starts idling your system, it will only make your cycle faster. Once one job can’t run, the scheduler just ignores the rest.

Having node_group_key with unused really only slows things down with a significant number of nodes (many thousand). If your cluster is smaller than that, feel free to use it. It doesn’t sound like it was the cause if your issues anyway.

At this point you’re going to have to look through the scheduler logs and see where the scheduler is spending all of its time. If there is a significant amount of time between the start of the cycle and the first ‘Considering job to run’, then it’s spending a lot of time querying the universe. This means you have a significant sized system or workload (like 100k+ jobs) or there is a networking issue. If you see time between Considering job to run and Job run, then that also leads me to think you have a network issue between the scheduler and server. There is likely a gap in timestamps between log lines somewhere in the log. Once you figure out where it is, raise the debug level, and look again and see if you can narrow down to what exactly the scheduler is doing.

Bhroam

Topic		Replies	Views
New node placement algorithm Developers	3	919	May 12, 2018
How to scatter jobs over vnodes? Users/Site Administrators	30	8153	May 19, 2020
Some jobs stay queued for extended periods of time despite availability of hosts Users/Site Administrators	7	139	July 8, 2025
Config for node_sort_key won't work Users/Site Administrators	3	465	March 14, 2023
Job Submission by Memory Users/Site Administrators	5	1371	February 9, 2018

Created node bucket messages in sched_logs

Related topics