PP-586: On a Cray X-series, create a vnode per compute node

altair4 · December 16, 2016, 8:55pm

Quibbles:

values used in setting mom_priv/config variables are not quoted - TRUE, not "TRUE", etc.
is it $vnode_per_numa or $vnode_per_numa_node? Both occur twice.

lisa-altair · December 16, 2016, 9:34pm

Based on the feedback, I have updated the purpose of the feature including updating the summary of the JIRA ticket. I have updated the external design. Please have a look at the external design: https://pbspro.atlassian.net/wiki/pages/viewpage.action?pageId=43548691

Provide your comments in this forum.

Thanks!

lisa-altair · December 17, 2016, 2:26am

Thanks for pointing it out @altair4. I have removed the quotes around TRUE and FALSE. And have updated the design to use $vnode_per_numa_node consistently (hopefully).

vccardenas · December 20, 2016, 1:24am

I think we’d want all moms to have the same vnode_per_numa_node setting. Shouldn’t there be a corresponding statement under “Administrator’s instructions”?

lisa-altair · December 20, 2016, 1:44am

@vccardenas, I removed it because I didn’t want it documented in our commercial documentation guides. But now that I think about it, I should have something to that effect in the interface sections. Thanks!

mkaro · December 20, 2016, 3:39pm

Having one vnode_per_numa_node setting across an entire cluster might be too restrictive in some cases. If an administrator is using PBS Pro to schedule across two independent clusters of nodes, they may want different settings for each cluster. This could apply to both Cray and non-Cray systems. It should really come down to they way mom reports the resources. That implies it should be a per-mom setting.

lisa-altair · December 20, 2016, 8:35pm

Interesting point @mkaro. The need for the cgroups hook to have vnode_per_numa_node set to different values per mom conflicts with the need for Cray X-series vnode creation to have the mom_priv/config vnode_per_numa_node set to the same value on all the moms within a single Cray system.

billnitzberg · December 20, 2016, 9:45pm

Agreed. I think that if we were to generalize this RFE to apply to more than only Cray X-series systems, then it would make the most sense to have one vnode_per_numa_node be configurable at the host level (versus MOM, as I feel MOM is too PBS-implementation specific, plus, one MOM can represent many hosts (and vnodes), and one host (or vnode) can be served by multiple MOMs).

Whether it makes sense to expand this RFE to include something like a per-host setting of one vnode_per_numa_node now versus later depends on available resources and whether it is possible to design a solution for the Cray X-series now that is also extensible (or at least sensibly depreciable) for the future.

lisa-altair · December 21, 2016, 11:13pm

I agree with @bhroam on both points. I couldn’t find an existing RFE for doing configuration using qmgr, so I filed PP-611 to track that feature request.

sgombosi · December 21, 2016, 11:56pm

We currently have multiple sites that are running a combination of multiple Cray systems and either non-Cray or Cray MAMU nodes as part of a single PBS complex. It’s quite conceivable that a site might want different settings for this feature on different MOMs hanging off a single server. Would putting this in qmgr make that more difficult (or even impossible)?

lisa-altair · December 22, 2016, 12:26am

Those are interesting questions that will have to be kept in mind when PP-611 is being designed. Please add your use cases to PP-611 so they are considered for that RFE.
This discussion is about the external design for PP-586 available at:
https://pbspro.atlassian.net/wiki/pages/viewpage.action?pageId=43548691
Where there will be an option in the mom_priv/config to cause one vnode to be created per NUMA node for Cray X-series compute nodes.

vccardenas · January 5, 2017, 7:04am

Just a nit: “Cray numa” should be “Cray NUMA”. Other than this the EDD looks good to me.

smgoosen · January 9, 2017, 8:21pm

Question: This config attribute does not have any effect on how MAMU nodes are reported, does it? I believe it is only useful for MoMs on login node that get the info re: compute nodes from ALPS.

lisa-altair · January 9, 2017, 10:08pm

Thanks for bringing it up @smgoosen. The configuration attribute only has an effect on the nodes reported by ALPS. I added some clarification to the design on this. PBS treats MAMU nodes like a standard Linux node, thus the $vnode_per_numa_node would not have an effect on how MAMU nodes are reported within PBS.

smgoosen · January 9, 2017, 10:31pm

I seem to remember that MAMU nodes are often former ALPS nodes, that is they have just been removed from ALPS’s control. What happens if vnode_per_numa_node is still set on a MAMU or non-Cray node (i.e. can’t talk to ALPS)? Is there some error that gets reported or is it just silently ignored?

lisa-altair · January 10, 2017, 12:02am

$vnode_per_numa_node has no effect if there is no ALPS information to act on. There is no log message either. Do you think there should be a log message when $vnode_per_numa_node is set, but there is no ALPS information to act on?
I should also mention that this feature is available only when PBS is built with configure --enable-alps. I will add that to the external design.

vccardenas · January 10, 2017, 1:50am

Regarding the statement below:
“resources_available.PBScrayseg will be set to 0 when vnode_per_numa_node is unset or set to FALSE”

do we really need to set PBScrayseg to 0 since the whole compute node is the vnode not just NUMA ordinal 0 ?
It seems that it should be unset.

Not currently mentioned in the EDD but are there other PBScray* attributes that need to exist or not exist
depending on the setting of vnode_per_numa_node?

smgoosen · January 10, 2017, 6:27pm

The original use case for PBScrayseg was to allow users to request a specific segment on each compute node. I believe if there is only one vnode then it isn’t necessary to set PBScrayseg.

The EDD looks good to me!

lisa-altair · January 10, 2017, 7:06pm

I agree with @vccardenas and @smgoosen about not needing to set PBScrayseg when vnode_per_numa_node is unset or set to FALSE. I have updated the external design accordingly.

@vccardenas I don’t think so. Please let me know if you are concerned about any attributes in particular.

smgoosen · January 10, 2017, 8:45pm

I still think the EDD is OK

Topic		Replies	Views
New configuration variable: PBS_MOM_NODE_NAME Developers	25	5286	August 4, 2016
PP-587: Have only one mom report the compute node information to the server Developers	10	1380	August 8, 2017
MOM sharing config Users/Site Administrators	8	3999	August 25, 2016
PP-685: Provide a coherent interface for managing configuration and other data Developers	1	1231	April 7, 2017
V2 config questions Users/Site Administrators	3	761	August 10, 2021

PP-586: On a Cray X-series, create a vnode per compute node

Related topics