Thank you Prakash for the comments. Please find the replies inline below.
_My initial set of comments - _
_General - _
1) The documents are lengthy, so I would suggest that we map the use cases with the requirements and interfaces, it will help in understanding the feature in a better way.
I have updated the document with a tractability matrix which has some mapping between UCR and interfaces.
UCR -
1) It is not mentioned if the power usage information will be seen in tracejob as well.
Since we update accounting and server logs, power usage will be visible in tracejob output.
2) As you mention that the power calculations can be assumed to be precise only when the node sharing is set to âexclusiveâ, shouldnât this be a mandatory step if power provisioning is enabled? I would like to see that all nodes and vnodes are automatically set to be exclusively used by a job/reservation when power provisioning is enabled. What do you think?
In cray we have nodes are set to exclusive by default. I believe making this not mandatory will give more control to admins.
@smgoosen can you answer these questions?
3) Name the âUse Cases and Requirementsâ.
Fixed it.
_External Design - _
_1) A.1.d - All the APIs should have the information about the parameters. We mention the names of the parameters in the explanation, but not while providing the name of the API. _
Fixed it.
2) A.1.d.i - We need not have the hosts parameter at all, as it can be derived from the job parameter.
Fixed it.
3) A.1.d.i.3,4 - Let us elaborate on âwhere it is appropriateâ.
Since we are working with tools and interfaces external to PBS on which PBS doesnât have control, we cannot exactly say when and where the error can occur. Hence it is written âwhere it is appropriateâ
4) A.1.d.iv - Why do we need the parameter if it can have only one value?
Query being generic request, if we have to extend the feature to have some other requests from vendor power interface, this is the best way to do it. Hence we have the argument used.
5) A.3.d.iii - What will be the value of energy if the operations are not allowed on the vnodes used by a job?
Job attribute energy wont be seen in that case.
6) A.4.d.vi - rephrasing needed to make it clear.
Done.
7) A.4.d.xii - âpremptâ should be âpreemptâ.
Done.
8) A.5.d.iii - âunsupportedâ should be ânot allowedâ.
Done.
9) A.5.d.vi - Why do we unset the eoe?
Since PBS changed the node state and once the job finished it is good to reset the node back.
10) A.10.d.i - âwill have a default of unsetâ should be âwill be unset by defaultâ.
Done.
11) A.10.d.ii - âpower_provisioningâ should be âpower_enableâ.
Done.
12) A.10.d.iii - âset Trueâ should be âset to Trueâ and âset Falseâ should be âset to Falseâ.
Done.
13) B.1.d.i - is not clear
The hooks check if power provisioning flags are enabled before doing any power related operations on the node. if the flags are disabled when a job is running, power profiles may not be deactivated or energy may not be updated.
14) B.1.e - should have an explanation similar to B.1.d.i
Done.
15) B.1.g - What if the eoe values are not reported the second time as well?
Checking MoM logs with debug level enabled for MoM and hook would be the good start.
16) B.2.a - Why so?
As per admin guide,
Prologue and Epilogue Limitations and Caveats
â˘The prologue cannot be used to modify the job environment or to change limits on the job.
â˘If any execjob_prologue hooks exist, they are run, and the prologue is not run.
â˘If any execjob_epilogue hooks exist, they are run, and the epilogue is not run.
17) B.4.a - âset Trueâ should be âset to Trueâ.
Done.
I have not gone through the new logs thoroughly.
Let me know if you have any more questions.