PP-734: Ability to release limited resources when a job is suspended

@agurban
Blank (unset) = no restrictions, release everything
a particular resource is mentioned = restrict to releasing only that particular resource

@arungrover, I agree with using ā€œrestrict_res_to_release_on_suspendā€ and the rest of the EDD looks good to me.

Also looks good to me, thanks!

I was thinking about this featureā€™s interaction with Multiple scheduler enhancement that is currently in progress.

The attribute to specify which resource to be released while suspending a job (restrict_res_to_release_on_suspend) is currently a server attribute. Now in case of multiple schedulers, I donā€™t know if there is any use case to have each of the scheduler have their own list of resources that can be released during suspension.

Iā€™m thinking that if there is any such need then, each of the scheduler can have itā€™s own set of ā€œrestrict_res_to_release_on_suspendā€ as part of their policy and override serverā€™s attribute. If it is not present in the policy then serverā€™s value will take effect (similar to how backfill_depth is implemented in server/queue).

Iā€™ve changed the design proposal because of an anomaly we found during code review.
Change is in Interface 2 and the change is highlighted here -
ā€œthe amount of resources that are released on each node that the job was running onā€

Looks good @arungrover!

Could you add the Python types for each of the attributes to the EDD?

@agurban Thanks for your review comment. Iā€™ve added python type to the document. Also changed the type of one of the attribute from ā€œarray_stringā€ to ā€œstring_arrayā€.
Please have a look.

Thanks @arungrover!

It seems dangerous (or at least un-Python-y) to use a plain string as the Python type when the type is really a very structured string (i.e., an array of resource names). Wouldnā€™t it be better to have a type that represents an array of resources and a constructor that takes a string version and converts/checks it appropriately, e.g., like pbs.node_group_key does? (I know there are other RFEs that previously took the ā€œstringā€ shortcut hack, butā€¦).

You are right! When I had a chat with @bayucan we believed that it will be a list of strings and Ideally it should be a list of strings but in reality it turns out that it was a string.
Now the problem I think is more generic in nature. Whenever anyone creates an attribute of type string_array our code internally translates into a python string by default. I believe instead of doing that it should by default be translating all string_array to python list.
If you agree then Iā€™ll log a bug so that any such addition in future gets correctly translated in python.

As I was verifying the feature, I noticed that the example for interface 3 doesnā€™t match the name of the attribute ā€œresource_released_listā€. I do wish the plural version was the correct one (resources_released_list) the symmetry would have been nice. But I supposed itā€™s a bit late for that comment. :wink: Can you please correct the example?

Re: string_array mapped to Python str

Yes, PBS Pro should definitely do a better job here, please do file an issue. Thx!

Changed the example. Thanks for catching that.

@billnitzberg
Filed PP-847

There was a bug filed (PP-844) which is about user being able to see only resorces_released_list when he/she stats the job after the job is suspended.
while addressing this bug I realized that user does not really have to see these attributes at all because RRTROS is set by an admin and what is release/not-released does not really matter to a user.
Thus, Iā€™ve made a change in the design document to make ā€œresources_releasedā€ and ā€œresource_released_listā€ attributes to be only visible to operator/manager privilege users.

Request community to please have a look at the document.

Makes sense that the user doesnā€™t need to see it. The design change looks good.

The changes in the design document to make ā€œresources_releasedā€ and ā€œresource_released_listā€ attributes to be only visible to PBS operator/manager look good to me.

@agurban pointed out that there seems to be a mistake in the document where it mentions resources_released_list as a Private interface which is not true. An admin can very well view this information by doing a qstat on the job.
Iā€™ve made a change to the interface and request community to have a look.
Thanks @agurban for catching this.

Makes sense to me. Iā€™m fine with the design change.