When a test writer uses PTL’s PBSInitServices’s functions directly (instead of daemon’s wrapper functions) (for example self.server.pi.restart()) in tests, followed by any operation that needs the latest pid like a self.server.signal(‘HUP”) PTL will fail because the newer pid is not updated in the daemon objects.
Some background : PTL’s Server, Mom, Scheduler and Comm objects store an internal cache of their daemon’s pid in the member variable ‘pid’. Every daemon object also contains an instance of PBSInitServices (member variable ‘pi’ ) in order to do start, stop, restart operations. Wrapper functions start() stop() and restart() in every daemon class call PBSInitServices’ functions like start(), restart(), start_server(), start_mom(), start_sched(), start_comm(), restart_server(), restart_mom(), restart_comm() and restart_sched() etc for respective operations. These wrapper functions then update their cache with latest pid(if it changes) via _update_pid() . PBSInitServices’s functions do not update ‘pid’ in Server, sched and comm and mom objects when it changes.
This problem can be resolved if we always fetched the daemon process ids, using _get_pid() rather than relying on the internal cache ‘pid’. _get_pid() gets pid from daemon’s lock file. Hence there is no need for the member ‘pid’ in the daemon objects. Infact it is better to remove this variable to avoid access to incorrect (not latest) pid. With this change there is also no need for the function
_update_pid() and it can be removed.
The other way of fixing this would be making PBSInitServices’ functions capable of updating the ‘pid’ in daemon objects. This requires PBSInitServices to contain all daemon objects. This isn’t a correct approach as it would make PBSInitServices class bulky and also change its real purpose.
Please find design here: https://pbspro.atlassian.net/wiki/spaces/PD/pages/1219985428/Remove+member+variable+pid+and+function+update+pid+from+PTL+s+daemon+objects
Please let me know your thoughts on this approach.