PP-732: scheduler's attribute allows to control unset resources in placement sets

Hello,

I have posted design proposal for PP-732.
Please provide your feedback.

Thank you.
Vasek

Thanks for writing this design doc. It looks good to me.

Bhroam

@vchlum Sorry for commenting on this design proposal so late.

I get what you are doing would help in not creating a placement set of nodes where resources are unset(or not-set). Can you please explain a little why not creating this placement set would help from the use case perspective?
Is it because scheduler is taking a long time in creating this placement set and by not creating you get a considerable amount of speedup?

Thanks!

@arungrover Our use case is following: We have several cluster and some of our clusters are interconnected with infiniband. Infiniband on cluster A is not interconnected with infiniband on cluster B. Some clusters use only ethernet. I need the user be able to ask for infiniband in general. The user doesn’t care if he/she gets infiniband A or B. He/She just need infiniband. If I use -l place=group=infiniband without only_explicit_psets, I may end up on cluster without infiniband. That’s something what user really don’t want if he/she asks for infiniband.

Another use case would be ‘scratch_shared’. You can have several shared scratches like this: The scratch_shared=A is mounted on nodes={n1,n2} and scratch_shared=B is mounted on nodes={n3,n4} but nodes={n5,n6} have only scratch_local. With only_explicit_psets=True you can ask for scratch_shared like this: -l place=group=scratch_shared.

Both of examples suppose do_not_span_psets=True. I see similar use cases with resources like ‘city’, ‘room’ …

This new option can be used if you want to be able to run a multinode job on bunch of nodes that are “somehow nearby” and you explicitly need the resource.

If there is a speed up in the scheduler, I see it rather as a side effect. I can imagine that in infrastructure where you have set only few nodes with some resource and you use placement set on the resource. You can have the similar behavior but faster if you use do_not_span_psets=False and only_explicit_psets=True.

I understand now. Thanks for your explanation @vchlum.

In section 4.8.33.4.i of the 14.2 Admin Guide, we describe how the scheduler chooses the most specific placement pool available. Is this affected by the value of only_explicit_psets, or by the value of do_not_span_psets?

Which of the User, Operator, Manager levels of PBS privilege can read the only_explicit_psets attribute? Which can set it?

Hello @agurban,

Privileges:

  • Operators and managers are allowed to set the only_explicit_psets attribute.
  • Users, operators, and managers are able to read the attribute.

4.8.33.4.i:

  • IMHO following sentence depends on do_not_span_psets:
    "The scheduler chooses one placement pool from which to select a placement set. If the job cannot run in that placement pool, the scheduler ignores placement sets for the job."
    Actually, do_not_span_psets=True means: “If the job cannot run in that placement pool, the scheduler waits for the placement set to be available.”
  • If only_explicit_psets=True then there is no placement pool with empty value, which doesn’t affect paragraph 4.8.33.4.i.

V.

Hello Václav,

Here’s a draft of the Placement Sets section of the Scheduling chapter
in the Admin Guide. Please take a look, and let me know what I should
change. Thanks.

-Anne

Trying again:

Argh. You can find the draft here: https://pbspro.atlassian.net/wiki/spaces/PBSPro/pages/71073808/Drafts+for+Review

@agurban I have read the draft.

I would change the start of the paragraph 4.8.33.4.i like this:
“The scheduler chooses one placement pool from which to select a placement set. If the job cannot run in that placement pool, the scheduler either ignores placement sets for the job (do_not_span_psets is set to false) or the job waits for suitable placement set to be available (do_not_span_psets is set to true).”

V.

@vchlum, thank you for reviewing the material.

I believe your concern is addressed in 4.8.33.4.ii, “Order of Placement Set Consideration Within Pool”, which covers the effect of do_not_span_psets. In this section, we say “If a job cannot statically fit into any placement set in the selected placement pool, the scheduler ignores defined placement sets and uses all available vnodes as its placement set, unless the do_not_span_psets scheduler attribute is True, in which case the job will not run.”

Section 4.8.33.4.i is about how placement pools are chosen, on which do_not_span_psets has no effect.

@vchlum, I was mistaken. Your changes do apply, and I will add them. Thank you.

@vchlum, I can only blame my cold. Section i really is only about how we choose the pool. But section ii is also only about choosing a placement set. So I’ve added a new section iii, which covers the question of whether the job ever gets to run (which is indeed controlled by do_not_span_psets). Hope you’re still reading this far down the topic.

New draft here: https://pbspro.atlassian.net/wiki/spaces/PBSPro/pages/71073808/Drafts+for+Review

@agurban The new draft is better then my suggestion. I read the sections 4.8.33.4.* again, and I think it is OK now.