External design document for PP-824: Cray - Ramp rate limiting

agrawalravi90 · January 10, 2018, 10:15pm

Hey Mike, i haven’t read the EDD and I don’t understand the feature, I was just curious about your last comment, you said that the MoM will not be in communication with server until something wakes it up, who exactly will wake it up ? It’s not talking to the server, so will it be the scheduler who wakes it up? Or will it require stimulus from the client? Or will it be somebody else?

ashwathraop · January 11, 2018, 2:33am

@agrawalravi90 In this particular context, MoM is no longer in communication because PBS has put it to sleep or powered it down. So PBS knows about the nodes current state and PBS also knows that it can bring the node up whenever it needs. Periodic hook will wakes it up in case of this feature with some help from scheduler.

agrawalravi90 · January 11, 2018, 2:50am

Thanks for clarifying Ashwath, but I’m still a bit confused. The server periodic hook being able to “Wake” the MoM up implies that the MoM is still listening to messages from the server right? Maybe I’m just confused about the terminology. Is it that the MoM will reject any messages from the server, except the “wake up” message that the server periodic hook will send it, at which point it will resume all communication again? Or is the MoM process just killed when the node goes to “sleep” and “waking” a node up makes the cluster manager start a new MoM process at which point it will just resume all communications again?

ashwathraop · January 11, 2018, 4:31am

I did not include technical details in my previous reply. Cray provided us a power management command line tool using which you can make a compute node power up or down. We have written few API’s to communicate with this tool. When I said PBS puts a node down I meant hook will talk to this tool to power down the nodes and it can bring them again when needed in similar fashion. Think of how we do OS provisioning. Provisioning hook is responsible for rebooting nodes with different images.

subhasisb · January 11, 2018, 5:33am

I like sleep much better than something like inactive - which could have dual meanings and confusing. sleep (or asleep) is clear - it is a state where it is not communicating with PBS - so covers powered down, sleep states etc.

agrawalravi90 · January 11, 2018, 7:14am

Got it, thanks for clarifying.

ashwathraop · January 18, 2018, 6:12am

Please post if there are any more comments on the design by EOD 19th Jan. I will assume I have sign-offs on the current state of the design if there are no more comments.

Thanks,
Ashwath

billnitzberg · January 25, 2018, 1:37am

Apologies I missed this earlier… there has been a major push to move all PBS configuration into standard qmgr settings (e.g., first class attributes and resources). Interface 2 seems to go in the opposite direction, adding configuration options using an alternative method.

Given that all other configuration settings are via standard configuration settings (node attributes), I would propose that the configuration options exposed via the JSON file are replaced with server attributes.

Note: I realize this comment is coming in late, so would understand if the maintainers decide it’s “too late”. It can be addressed later (along with moving other config options to qmgr).

mkaro · January 25, 2018, 5:44pm

Hello @billnitzberg, and thanks for raising a very valid point. While it’s never “too late” in the land of software, we would have to evaluate the amount of delay such a change would incur and whether it makes sense to complete implementation with the current design. The code changes for this feature have already gone through two rounds of review. Ultimately, it depends on whether the audience for the feature is willing to wait. I’ll defer that decision to @subhasisb and others that are more familiar with our current release plans.

ashwathraop · January 29, 2018, 5:16am

Hello @billnitzberg, thank you for the comments. Initial idea was to keep all the attributes as part of the server configuration. It made sense when we had server doing all of the power on and off. But with latest design proposal we decided to offload this task from server and made the server periodic hook capable of doing this task as hook is the one who finally talks to power infra and get the nodes to go down and up. In the former case server was just deciding what nodes to put down and which one to bring up and pass the data to a hook. With hook doing all the job we thought making these attributes part of JSON config file made more sense as none of the other PBS components needed them.

Since we have already come across multiple rounds of development and review I would second @mkaro on finishing implantation and later decide how we want to deal with placement of such attributes/config options. Thoughts?

subhasisb · January 29, 2018, 5:40am

I agree with @billnitzberg, @mkaro in principle. We should push towards config via qmgr.

However, I feel that while the goal is clear, the inner workings and details should be worked out first before we make that a mandate. I request the maintainers to work out how such hook attributes (and similar stuff) should be handled, so that such a design/architecture can be shared, discussed and then agreed upon. Maybe we should create a separate RFE to implement the plumbing of this requirement, so that projects can concentrate on their core business logic without bothering about how to set configuration properly.

mkaro · January 29, 2018, 6:50pm

I would envision an implementation that is capable of reading/writing configuration parameters by using any of the following…

C API
PBS module for Python
qmgr

I agree we should start by opening an RFE. With regard to ramp rate limiting, I think we should proceed with the current plans and later convert to the new configuration infrastructure.

billnitzberg · January 30, 2018, 12:47am

Thanks. I feel like my comment was not quite clear; let me try again:

We shouldn’t introduce new types of config parameters for core features – we already have too many, which is why we are discussing moving everything to one system (e.g., via qmgr) for the future. It’s simply good design to use existing config methods for core features, especially when existing config methods will work well. (This design includes node attributes, so it’s already depending on built-in config parameters.)

My comment was meant to (1) ask if there’s time to address this (for which it sounds like the answer is that it’s “too late”), and (2) if it’s “too late”, then to make a plea for everyone to be more careful moving forward (and ensure this design does not become a precedent for future enhancements).

Thanks!

subhasisb · January 30, 2018, 5:28am

Bill are you suggesting we move the node related attributes to node object? I guess we considered the parameters “node_idle_limit”, “min_node_down_delay”, “max_concurrent_nodes” as not true/generic node attributes, but attributes related to the power feature related to nodes.

Instead of adding attributes to the node object that would only be used by the power hook, It made sense to segregate them into “config for power”. This way, the cloud hooks also maintain a set of attributes that the cloud hook only uses internally. There seemed to be no point in add all of these to the server/node/queue attributes…

billnitzberg · January 30, 2018, 7:24pm

No. I believe we should avoid the use of the “hook config file” entirely for PBS Hooks. We already have well-established mechanisms for configuration via attributes and resources. Good design practice would be to use existing mechanisms versus introducing exceptions.

(My note about node attributes was that the current design fails to encapsulate all the new config parameters anyway – some parameters are defined using the well-established mechanisms, e.g., Interface 3: New node attribute: poweroff_eligible.)

subhasisb · January 31, 2018, 6:10am

This is the part for which i felt we should have a separate effort (and discussion) altogether to discuss with the community as to how to handle hook configurations. It would be great to bring that up with the maintainers and come up with a proposal for the community to review. In the absence of that worked out, I feel, we should go ahead with how the power RFE is using it.

Ah, i see. We felt that some of the attributes are generic enough to be useful for PBS in general and not configuration for this particular hook itself. Thus the poweroff_eligible was seen as a node_attribute to be used by any PBS feature. Whereas the other attributes were seen as the power-hook-specific configuration. Thus the difference. If you think that is not right, we should discuss, even if we are may not be able to incorporate all the changes into the currently scheduled implementation.

billnitzberg · January 31, 2018, 9:36pm

Thanks… I believe we are having different discussions. My perspective is that this is a proposed core feature, and should therefore use the core UI, regardless of how it’s implemented. The proposed implementation happens to leverage a PBS Hook instead of direct C code – that’s great, and it’s purely an internal design choice. How a feature is implemented should not drive how we expose the external UI for the feature. Note that there are two types of hooks in PBS: site hooks (add-ons, not part of core PBS Pro) and PBS hooks (core bits of PBS Pro that happen to be implemented using the plug-in framework).

(There is a separate issue as to whether the maintainers allow this code as-is, because this design issue was found very late in the process. My hope is that if it is accepted as-is, it does not create a precedent.)

subhasisb · February 1, 2018, 6:06am

Agreed. Not sure if we are discussing different things, but certainly we need more discussions. I suggest that in the interest of time we allow this feature to go ahead as per the current proposal, while we continue to discuss about the right design/architecture in the background. Thanks!

Topic		Replies	Views
PP-824: Use cases for Cray power ramp rate limiting Developers	3	974	October 31, 2017
PP-587: Have only one mom report the compute node information to the server Developers	10	1380	August 8, 2017
PP-586: On a Cray X-series, create a vnode per compute node Developers	40	4823	January 10, 2017
PP-706: Automatically create KNL specific information Developers	36	2811	March 30, 2018
PP-759: possibility to disable job-wide limit enforcement for exclusive jobs Developers	11	2031	February 26, 2021

External design document for PP-824: Cray - Ramp rate limiting

Related topics