Remove member variable 'pid' and function _update_pid() from PTL's daemon objects

lsubramanian · April 9, 2019, 6:35pm

When a test writer uses PTL’s PBSInitServices’s functions directly (instead of daemon’s wrapper functions) (for example self.server.pi.restart()) in tests, followed by any operation that needs the latest pid like a self.server.signal(‘HUP”) PTL will fail because the newer pid is not updated in the daemon objects.

Some background : PTL’s Server, Mom, Scheduler and Comm objects store an internal cache of their daemon’s pid in the member variable ‘pid’. Every daemon object also contains an instance of PBSInitServices (member variable ‘pi’ ) in order to do start, stop, restart operations. Wrapper functions start() stop() and restart() in every daemon class call PBSInitServices’ functions like start(), restart(), start_server(), start_mom(), start_sched(), start_comm(), restart_server(), restart_mom(), restart_comm() and restart_sched() etc for respective operations. These wrapper functions then update their cache with latest pid(if it changes) via _update_pid() . PBSInitServices’s functions do not update ‘pid’ in Server, sched and comm and mom objects when it changes.

This problem can be resolved if we always fetched the daemon process ids, using _get_pid() rather than relying on the internal cache ‘pid’. _get_pid() gets pid from daemon’s lock file. Hence there is no need for the member ‘pid’ in the daemon objects. Infact it is better to remove this variable to avoid access to incorrect (not latest) pid. With this change there is also no need for the function
_update_pid() and it can be removed.

The other way of fixing this would be making PBSInitServices’ functions capable of updating the ‘pid’ in daemon objects. This requires PBSInitServices to contain all daemon objects. This isn’t a correct approach as it would make PBSInitServices class bulky and also change its real purpose.
Please find design here: https://pbspro.atlassian.net/wiki/spaces/PD/pages/1219985428/Remove+member+variable+pid+and+function+update+pid+from+PTL+s+daemon+objects
Please let me know your thoughts on this approach.

vstumpf · April 10, 2019, 12:17am

Do any tests use the .pid member variable? If tests need to use _get_pid(), it might make sense to make it a public function.

agrawalravi90 · April 10, 2019, 10:46am

Thanks for proposing this change @lsubramanian, I like the direction. I do agree with Vincent, I think we should make get_pid() a public method.

lsubramanian · April 10, 2019, 5:39pm

Thanks @vstumpf and @agrawalravi90 for looking at this. There is already a get_pid() public method for every daemon object which calls _get_pid(). I will add this info in the design doc.

vstumpf · April 10, 2019, 5:47pm

Thanks @lsubramanian
I sign off on the design.

agrawalravi90 · April 10, 2019, 6:06pm

Sounds good, I sign-off as well, thanks.

Topic		Replies	Views
Ptl_testlib and scheduler restart Developers	4	410	July 1, 2021
PP-660: Add support to PBS init script to act on any PBS daemon individually Developers	15	1819	April 19, 2018
PP-239 and PP-659: Decorator to skip PTL tests on cpuset mom; Specifying default test case time out value while running PTL tests Developers	9	1469	March 23, 2017
Systemd Service Enhancement Users/Site Administrators	13	1584	April 25, 2021
PTL framework’s class “Server” doesn’t contain mom objects Developers	3	697	December 4, 2018

Remove member variable 'pid' and function _update_pid() from PTL's daemon objects

Related topics