I’ve written up a small EDD describing a new scheduler optimization. It’s a way of grouping similar jobs together so the scheduler knows if one job can’t run, the rest of them can’t run. It’s similar to how job arrays work, but with normal jobs.
PP-480: Job Equivalence Class Optimization EDD
If I could get comments on it, that’d be great!
Love the optimization – thanks!
Since the scheduler can detect all cases where this optimization can be safely performed (thereby trimming the search space and speeding up the scheduler), I propose removing the “equiv_class_disable” setting from the design.
Although this optimization will potentially eliminate a bunch of log messages (since the “can this job run” search space is reduced), my perspective is that the logs are not a place where we should expend effort to support 100% backward compatibility, except in rare circumstances.
I’m torn about removing the disable flag. This optimization requires the scheduler to do work creating the classes at the start of the cycle. If a site has all unique (or mostly unique) jobs, they’ll end up with one class per job. This means the scheduler will do extra work for nothing. If a site knows this ahead of time, they can turn off this feature. While the scheduler could detect whether or not to use the optimization, it would take just as long to create the classes as to detect if the optimization is useful.
How much slower would things be in this situation? That would depend on the number of jobs. I don’t really have an answer to this question though.
Now the other side of the coin. Would an admin of such a site actually use this knob? We have so many confusing knobs in the product already. Would an admin go through the trouble to learn what this knob does to actually use it? I’m thinking they wouldn’t. It won’t affect the end result of jobs being scheduled. The cycle might take longer, but would they notice?
Like I said, I’m torn.
@bhroam: thanks. As you are torn, I suggest not including this additional feature in this version. It’s best to start minimal and only add features that are known/shown to be useful/needed.
OK, I have removed the interface from the EDD.
Your design proposal looks good @bhroam
During the time I was implementing the PoC of this feature I needed to make some minor modifications to the design. They mainly have to do with when a job’s queue gets taken into account when creating equivalence classes. It’s no longer an all or nothing sort of thing if such a situation exists on the system. When creating a equivalence class, a job’s queue is looked at if it falls into the defined criteria. This means you can have some jobs that use their queue and others that do not.
One is another thing that came up during the discussion of the feature. As designed, it is no longer possible to determine the political ordering of the jobs via the log files. Prior to this feature the order of ‘Considering job to run’ lines would give the political order. Now when a job can’t run, the entire class is marked that it can’t run. You now get a political ordering of equivalence classes instead of jobs.
I think this is one of those things we need to give up for scalability. Complexes are growing as well as the number of jobs on those complexes. We need to be doing less work per object (job, node, etc) in order to keep up.
That being said, there is functionality in PTL that will change. Right now there is a cycle object and a member of that object is the political order. This will now give you a political ordering of equivalence classes. The only way to get a political ordering of jobs is to jump through hoops to make sure each job is in its own class.
After further thought I have redesigned this feature. There were issues with not considering jobs in the political order. There are now no external changes for this feature. The scheduler will consider each job. When considering a job, the scheduler looks at the equivalence class. If we already know the class can’t run, we stop. Since there are no external changes, it will still be possible to determine the political ordering of a cycle by looking at the order of the ‘Considering job to run’ log messages. This design is minorly slower than the previous one, but does not break scheduler or PTL functionality.
I have updated the design to reflect the new changes.
@bhroam the revised design looks good to me. I approve of the design and have no further comments.
Looks good to me as well. Thanks!
@bhroam design looks good
Hey Bhroam, your design looks good but in order to test I request to add log messages. I understand there is no requirement hence by going the current interface definition we should make them contractual.
I added the log message you requested to the document.
Thanks Bhroam. It looks good to me.
First time I’m reading this. Sorry for not looking at it earlier.
As documentation either for how users would expect scheduling to
behave or how admins would know how to configure equivalence classes,
it’s not helpful.
Similar is defined by the following attributes and resources: is
nonsensical. Similarity is not defined by the mere presence of
attribues and resources. Are you really trying to say that jobs are
to be considered similar if
- both have any user/group/project limits?
- both are in a queue with any hard or soft limits?
Surely not. Moreover, there are several examples but no statement
about whether the conjunction joining all of them is
specification of what is required for two jobs to be considered
similar needs to be precise. This language is more like hand waving
and hoping that everyone will interpret what’s written in the same
What does The scheduler starts considering jobs in sorted order
have to do with equivalence classes? What precisely defines an
equivalence class? You launch into How equivalence classes work
before ever saying what an equivalence class consists of.
Users should not care about equivalence classes. They just work and make the scheduler faster. There is no configuration of equivalence classes. The scheduler determines what jobs are similar and groups them together. This EDD is not really needed. It talks about the internals.
I think you misunderstood the paragraph defining how job equivalence classes are made. I think all of your objections stem from this. I am listing the attributes used in defining equivalence classes. I’m not saying jobs don’t have to be identical. I am defining ways a job doesn’t have to be identical to be similar. For example, if euser is used, jobs that have the same user are in the same class. If they have different users, they are not in the same class. If there are not any user limits, then euser is not used and jobs owned by different users can be in the same equivalence class. This is a list of attributes that define how equivalence classes are created and what situation in which those attributes are used.
I disagree I am being hand-wavey. The list of attributes which define equivalence classes and in which situation the are used is very precise.
I am describing how the classes are used. What problem do you have saying the scheduler starts considering jobs in sorted order? This is how a scheduling cycle starts. Jobs are considered in sorted order. I can’t really say how equivalence classes are used if I don’t say jobs are considered in sorted order. The first time a job in an equivalence class doesn’t run, we mark the whole class that it can’t run. Which is the first job that this will happen with? The first job in the class in the sorted order.
I don’t think you ever define how job equivalence classes are made. You list the attributes that
are used as a basis, but nowhere have you referred to the values of attributes. Let’s take
Similar is defined by the following attributes and resources:
euser: If there are any user limits(soft or hard)
Similar is not defined by an euser attribute but (presumably) by whether jobs share the same
value for that euser attribute. You don’t say that.
What problem do you have saying the scheduler starts considering jobs in sorted order?
The problem is that it’s irrelevant to job equivalence class optimization. It’s the existing
behavior and you’re not changing it (right?) so why are you mentioning it at all?
I’ve added some language about the values of attributes. Take a look.
Sometimes you need to describe existing behavior to better describe new behavior. That was my reasoning behind it. I removed the line in question because the next line (“The scheduler starts considering jobs in sorted order”) covers what I wanted to cover well enough. It’s still existing behavior, but I think it’s required.
Thanks. That’s better now - a couple more adjustments,
- change Similar is defined to Similarity is determined
- instead of Time based resources such as, give a complete list of the affected resources
and a question: how is
place handled? For example, is
-lplace=pack:excl is handled
I updated the document. As it turns out the time based resources was a complete list. I just left it a little vague in the document. I updated that.
As for the place statement, the scheduler breaks it apart for its own use. No matter what order you list the parts of a place statement in, the scheduler will compare them correctly.