Thank you Prakash for the comments. Please find the replies inline below.
_My initial set of comments - _
_General - _
1) The documents are lengthy, so I would suggest that we map the use cases with the requirements and interfaces, it will help in understanding the feature in a better way.
I have updated the document with a tractability matrix which has some mapping between UCR and interfaces.
1) It is not mentioned if the power usage information will be seen in tracejob as well.
Since we update accounting and server logs, power usage will be visible in tracejob output.
2) As you mention that the power calculations can be assumed to be precise only when the node sharing is set to “exclusive”, shouldn’t this be a mandatory step if power provisioning is enabled? I would like to see that all nodes and vnodes are automatically set to be exclusively used by a job/reservation when power provisioning is enabled. What do you think?
In cray we have nodes are set to exclusive by default. I believe making this not mandatory will give more control to admins.
@smgoosen can you answer these questions?
3) Name the “Use Cases and Requirements”.
_External Design - _
_1) A.1.d - All the APIs should have the information about the parameters. We mention the names of the parameters in the explanation, but not while providing the name of the API. _
2) A.1.d.i - We need not have the hosts parameter at all, as it can be derived from the job parameter.
3) A.1.d.i.3,4 - Let us elaborate on “where it is appropriate”.
Since we are working with tools and interfaces external to PBS on which PBS doesn’t have control, we cannot exactly say when and where the error can occur. Hence it is written “where it is appropriate”
4) A.1.d.iv - Why do we need the parameter if it can have only one value?
Query being generic request, if we have to extend the feature to have some other requests from vendor power interface, this is the best way to do it. Hence we have the argument used.
5) A.3.d.iii - What will be the value of energy if the operations are not allowed on the vnodes used by a job?
Job attribute energy wont be seen in that case.
6) A.4.d.vi - rephrasing needed to make it clear.
7) A.4.d.xii - “prempt” should be “preempt”.
8) A.5.d.iii - “unsupported” should be “not allowed”.
9) A.5.d.vi - Why do we unset the eoe?
Since PBS changed the node state and once the job finished it is good to reset the node back.
10) A.10.d.i - “will have a default of unset” should be “will be unset by default”.
11) A.10.d.ii - “power_provisioning” should be “power_enable”.
12) A.10.d.iii - “set True” should be “set to True” and “set False” should be “set to False”.
13) B.1.d.i - is not clear
The hooks check if power provisioning flags are enabled before doing any power related operations on the node. if the flags are disabled when a job is running, power profiles may not be deactivated or energy may not be updated.
14) B.1.e - should have an explanation similar to B.1.d.i
15) B.1.g - What if the eoe values are not reported the second time as well?
Checking MoM logs with debug level enabled for MoM and hook would be the good start.
16) B.2.a - Why so?
As per admin guide,
Prologue and Epilogue Limitations and Caveats
•The prologue cannot be used to modify the job environment or to change limits on the job.
•If any execjob_prologue hooks exist, they are run, and the prologue is not run.
•If any execjob_epilogue hooks exist, they are run, and the epilogue is not run.
17) B.4.a - “set True” should be “set to True”.
I have not gone through the new logs thoroughly.
Let me know if you have any more questions.