PP-719: Enhance setUp in PTL specifically for Cray platforms

This message is to inform the community that the design document covering the enhancement of PTL setUp() for Cray is now available. You may review the document here:

https://pbspro.atlassian.net/wiki/display/PD/PP-719%3A+Enhance+setUp+in+PTL+specifically+for+Cray+platforms

Please provide comments in response to this post. Thank you!

Hi @vccardenas, Thanks for posting this. These changes to PTL look good to me.

Hey @vccardenas,

In “Design of PBSTestSuite.setUp() for Cray”, the title of this section says “Design” but I feel that content of this section is talking about code instead design.
Actually I think we don’t need “Design of PBSTestSuite.setUp() for Cray” section at all. For this work whatever we need is already (nicely) explained in “Interface: setUp()”.
So I think we should remove “Design of PBSTestSuite.setUp() for Cray” section or modify that section with no code details.

Some comments on “Interface: setUp()”:

  • Please change “setUp()” to “PBSTestSuite.setUp()” in title.
  • Can you please explain why we need this “$usecp ‘*:/home /home’” and “$vnodedef_additive 0” in mom config?
  • In Server Settings, “scheduling” and “default queue” is same as plain Linux out-of-box configuration, so no need to mention here.
  • In “Details” section (2nd line), please change “plain Linux default settings.” to “plain Linux out-of-box configurations”.

Hi @hirenvadalia,

I have updated the page per your comments. I have modified the Design section without code details. I want to keep the Design section because that will be the basis of the implementation and will serve as future reference.

The $alps_client setting in mom_priv/config will be set by default as part of the installation on Cray under CLE 5.2 once PP-702 is resolved. Otherwise, your design looks fine.

Hi @vccardenas, now that I’ve seen the comment from @hirenvadalia, I have a suggestion for your design that I believe will make things clearer, and more closely follow the other design documents. List each interface/behavior change separately. For example, have Interface 1: set “$vnodedef_additive 0”, Interface 2: set “$alps_client”, etc.

@lisa-altair, that is indeed much clearer and follows standard practice as seen ih other PBS design documents.

@vccardenas I agree with @lisa-altair we should make separate interfaces for more readability, and that way you can merge current Interface and Design section in appropriate new interface.

@hirenvadalia and @lisa-altair, I have updated the page with the suggested format .

@vccardenas overall the document looks good. I just have a question - While deleting and recreating moms in server.revert_to_default interface, does it have to wait for all desired vnodes to come up and show as “up and running” before coming out of revert_to_default ?

Interesting question @arungrover. It may be good for PTL to check that the one node it just added has actually been added and the node is free…and if not, wait for 1 second and check again (should we do this a some number of times?)…before returning from the function with either a success or failure.

@vccardenas, this is getting closer to what I would like to see. However, I still feel that the interfaces need to be even more granular. As suggested in my prior comment, each tunable/resource/hook/behavior that this project is changing from what already exists should be called out as a separate interface change. For example, this project is proposing that a hook be enabled as part of the setUp function, that would be one interface, a second interface would be to recreate the vnodes, etc.
Example:
Interface: enable PBS_translate_mpp hook
Visibility:
Change Control:
Synopsis:
Description:
Interface: vnode recreation
Visibility:
Change Control:
Synopsis:
Description:

One other thing, the name of the hook is not “PBS_translate_mpp.HK”, it is PBS_translate_mpp.

@lisa-altair, I have changed to the format you suggested. But the interface you suggested, for example “enable PBS_translate_mpp hook” and “vnode recreation”, are actions, not interfaces.

@vccardenas document looks good to me. I sign-off

@vccardenas : is hook section applicable for only Cray simulator ? If it applies to actual cray also then following statement need to updated.

  • if on a Cray ALPS simulator, then enable the PBS_translate_mpp hook.

to

  • if on a Cray or Cray ALPS simulator, then enable the PBS_translate_mpp hook.

@kjakkali, I took out the hooks section. Built-in hooks (e.g. PBS_translate_mpp) are not affected by Server.revert_to_defaults().

still looks good to me.

@vccardenas, document looks good to me.

@borlesanket, posting here our email exchange:

[borlesanket] One query I have,
In last review I mentioned that some parameters ($vnodedef_additive 0, vntype, PBS_translate_mpp.HK hook enabled true) doesn’t get set in case of Cray simulator.
Is there any reason that, these parameters get set only in case of actual Cray cluster by default and not in case of simulator? And is there any meaning configuring them on simulator or should they be ignored in case of simulator?

[vccardenas] The settings you mentioned, $vnodedef_additive 0, vntype, PBS_translate_mpp.HK hook enabled true,
are meant for a Cray platform so they should also apply to the Cray ALPS simulator. They are set
automatically on a real Cray but not on the Cray ALPS simulator.
Granted the ALPS database does not change (unless we manipulate it somehow) the
$vnodedef_additive might not have much use in a Cray ALPS simulator. vntype is needed on
Cray platform so that jobs can be scheduled based on it. The translate hook is needed when requesting
mpp resources - the job request gets translated into the select/place language.

@vccardenas Does that mean that the server hook setting will remain unchanged from an “out-of-box” setting when running PTL? What if a test disabled the hook? Shouldn’t the hook be re-enabled back to “out-of-box” settings?