Replace starving with eligible_time

Ever since eligible_time was introduced, using help_starving_jobs was not as good as using the job_sort_formula. One second a job is not starving and has low priority. The next second it has super high priority. Using eligible_time in the job_sort_formula will allow jobs to slowly gain priority as they wait longer.

Design: PBS Pro Confluence

If anyone has any comments, please post them here. Also, any real world examples of formulas with eligible_time would be welcome.

Bhroam

1 Like

Thanks for doing this Bhroam. My only comment is that we usually keep the old functionality working while we tell users that we are deprecating it. You mentioned in the design that these will no longer work. So, just wanted to confirm that it is ok to do this.

help_starving_jobs has been deprecated for some time now (several years). It’s time to remove it. I didn’t really care that it existed before now, but it is now getting in the way of my scheduler persistence work. It will slow things down if I need to keep the starving status up to date on every job.

Bhroam

oh my bad, i didn’t read your design properly, if it’s been deprecated for years then we should get rid of it asap, thanks for clarifying

The NAS mods to the scheduler define max_starve on a per queue basis and use it to age jobs at different rates, based on the queue. This is something that could be done with a job_sort_formula, but it would be ugly. (Plus, job_sort_formula does not meet NAS’s needs for various, other reasons.)

When max_starve is removed, NAS will need to bring it back, perhaps with a different name (e.g., “aging_period”).

I have a middleground which might work, but it’s just going to be a bit ugly.

Since job_sort_formula doesn’t work for NAS, I’m assuming you use job_sort_key. We could enhance job_sort_key in two ways

  1. add an eligible_time keyword
  2. call formula_evaluate() on each job_sort_key. This would mean you’d get a N job formulas which work like job_sort_key.

Now comes the ugly part. For each queue you wanted a different max_starve for, you’d need to create a multisched. Each multisched has its own sched_config, so you could have a different way of sorting based on that queue.

You’re right, pretty ugly.

How about this: Go ahead and remove help_starving_jobs, is_starving, and their logic, but leave max_starve as a scheduler parameter, supported, but unused by standard code? That would allow NAS to keep their local code with just minor changes. Eventually, NAS should create a new scheduler config option and switch to it (e.g., “aging_period”).