Systemd Service Enhancement

Hi,
I would like to open a discussion for the Systemd Service Enhancement.

Overview:
Currently we have only one unit service for pbs which is pbs.service. By starting this service all the daemons starts together there is no such way to start daemons separately. Whenever any of the daemons were stopped via mechanism other than systemclt or if one or more daemons gets killed, in this case systemclt fails to report the correct status of daemons. Systemclt status still shows pbs.service running. So for that a desired approach can be making all the daemons as a separate service and have their own unit files so that the daemons can be ( start | stop | restart) the service when required.

Design :

Please provide feedback !!

Thanks,
Jitendra

This change means that the commands to start/stop on different systems would be different. Perhaps you could retain the PBS_START entries in /etc/pbs.conf and still have the pbs script pay attention to them. That is, I could run systemctl start pbs_server pbs_mom pbs_comm pbs_sched everywhere, but only the correct pieces would be started on a given host.

The more I think about it, the more this change would not work well in our environment. We would likely keep the current setup and ignore these new bits. We just don’t have enough cases where we need to start/stop/restart only one of the pbs daemons on a host.

Could most of what you want be supplied by hijacking the systemctl reload action via the following in pbs.service?

ExecReload=/opt/pbs/libexec/pbs_init.d start

With this, you could restart any missing daemons with a simple systemctl reload pbs. If no daemons are missing, (almost) nothing happens, so it’s fairly safe.

Hi Dale,

One of the problem of having the PBS_START switches as well as having separate unit files is that it would mean two steering wheels for the same car. We are apprehensive that it would result in confusion. With systemd I would typically enabled/disable a service by executing systemctl enable/disable pbs_mom. Now if the PBS_START_MOM parameter is also set (and to a different value), then could be a source of confusion.

The other issue with using the single init file for starting multiple services under systemd is that the status would not be reported correctly about the individual daemons. Since the pbs init script starts multiple forked pids (mom, comm, sched etc), there is also not a single pid that we can add to the pidfile to be monitored by systemd, to report a daemon wise up/down status. AFAIK, we cannot also “implement” the status action to do something custom, like checking the services running to see if each one of them are up.

Besides, having the services in separate service units allows for separate control groups for each of them - sounds like an advantage?

On the other hand, if some site needs to fall back to sysV init, the code would still support the PBS_START_XXX parameters in pbs.conf. All we would need is for the admin to add them back and start using the sysV init commands (or pbs_init.d directly).

What do you think?

Hi Dale,

Thanks for the suggestion. I tried your suggestion of making change into pbs.service and adding ExecReload. After this when I do systemctl reload pbs it should start the missing daemon but it doesn’t. Although in the systemctl status I can see that service is getting reloaded and missing daemon is starting but is does not get the place in the service which is already running also the pid of that daemon does not appear on the console when we try to find.

This is my active service and scheduler has been killed
CGroup: /system.slice/pbs.service
├─100046 /opt/pbs/sbin/pbs_comm
├─100069 /opt/pbs/sbin/pbs_mom
├─100148 /opt/pbs/sbin/pbs_ds_monitor monitor
├─100174 /usr/bin/postgres -D /var/spool/pbs/datastore -p 15007
├─100185 postgres: logger process
├─100193 postgres: checkpointer process
├─100194 postgres: writer process
├─100195 postgres: wal writer process
├─100196 postgres: autovacuum launcher process
├─100197 postgres: stats collector process
├─100287 postgres: postgres pbs_datastore 192.168.37.155(58594) idle
└─100330 /opt/pbs/sbin/pbs_server.bin

We can see that service is reloaded
Oct 21 02:03:34 jitendra systemd[1]: Reloading Portable Batch System.
Oct 21 02:03:35 jitendra pbs_init.d[101825]: Starting PBS
Oct 21 02:03:35 jitendra pbs_init.d[101825]: PBS comm already running.
Oct 21 02:03:35 jitendra pbs_init.d[101825]: PBS mom already running.
Oct 21 02:03:35 jitendra pbs_init.d[101825]: PBS sched
Oct 21 02:03:35 jitendra pbs_init.d[101825]: PBS Server already running.
Oct 21 02:03:35 jitendra systemd[1]: Reloaded Portable Batch System.

After reloading the service sched is still not placed in this service’s control group
Process: 101825 ExecReload=/opt/pbs/libexec/pbs_init.d start (code=exited, status=0/SUCCESS)
Process: 99989 ExecStart=/opt/pbs/libexec/pbs_init.d start (code=exited, status=0/SUCCESS)
Tasks: 10
Memory: 12.0M
CGroup: /system.slice/pbs.service
├─100046 /opt/pbs/sbin/pbs_comm
├─100069 /opt/pbs/sbin/pbs_mom
├─100148 /opt/pbs/sbin/pbs_ds_monitor monitor
├─100174 /usr/bin/postgres -D /var/spool/pbs/datastore -p 15007
├─100185 postgres: logger process
├─100193 postgres: checkpointer process
├─100194 postgres: writer process
├─100195 postgres: wal writer process
├─100196 postgres: autovacuum launcher process
├─100197 postgres: stats collector process
├─100287 postgres: postgres pbs_datastore 192.168.37.155(58594) idle
└─100330 /opt/pbs/sbin/pbs_server.bin

Please let me know if I missed anything.

I’m still not seeing a lot of benefit for the typical admin compared with keeping everything under one systemd service.

I agree that splitting things apart helps developers, but it makes things more complicated for the operators who would need to know just which pieces are supposed to be running on each host.

In terms of configuration managment, it’s easier to have a single templated file, /etc/pbs.conf, that indicates which daemons should be running on each kind of host, than to have to create or remove symlinks under multi-user.target.wants.

I have a semi-hack to fix that. Create a short wrapper script:

#!/bin/bash
me=$$
cgpath=/sys/fs/cgroup/systemd/system.slice/pbs.service
if [ -e $cgpath/cgroup.procs ]; then
    echo $me > $cgpath/cgroup.procs
fi
exec "$@"

Now, change the ExecReload action to use that to wrap the start command:

ExecReload=/root/pbs_join /opt/pbs/libexec/pbs_init.d start

With this, I started pbs, pkilled pbs_comm, ran systemctl reload pbs and got:

mtestpbs pyqs # systemctl status pbs
* pbs.service - PBS daemons
   Loaded: loaded (/etc/systemd/system/pbs.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2020-10-21 09:30:22 PDT; 11min ago
  Process: 71268 ExecReload=/root/pbs_join /etc/init.d/nas/pbs start (code=exited, status=0/SUCCESS)
    Tasks: 46 (limit: 8000)
   CGroup: /system.slice/pbs.service
           |-70880 /PBS/sbin/pbs_sched -I watson -S 15010
           |-70895 /PBS/sbin/pbs_sched
           |-70956 /PBS/sbin/pbs_ds_monitor monitor
           |-70969 /PBS/pgsql-9.3.22.sles12sp3/bin/postgres -D /var/spool/pbs/datastore -p 15007
           |-70970 postgres: logger process   
           |-70972 postgres: checkpointer process   
           |-70973 postgres: writer process   
           |-70974 postgres: wal writer process   
           |-70975 postgres: autovacuum launcher process   
           |-70976 postgres: stats collector process   
           |-70994 postgres: pbsdata pbs_datastore 127.0.0.1(38636) idle
           |-70995 /PBS/sbin/pbs_server.bin
           `-71308 /PBS/sbin/pbs_comm

Note the new pbs_comm shows up in the process list.

I like the potential benefit of having separate control groups for each daemon. Will this new design make our existing PBS parent job service (pbs_jobs.service, created by PBS cgroups hook) unnecessary? Can jobs run under the control groups of pbs_mom.service in the future?

I see having single unit for the daemons, we cannot utilize some of the systemd features like, auto restart a daemon on-failure and configuration reload for a daemon.

And the issue is, if any of the daemon gets stopped admin will not get notified, it will be still showing as running.

We can still have one unit which can manage other daemon specific units as child.

No, I do not think it will replace our existing pbs_job.services, for pbs_mom(execution rpm) this will be same as existing pbs.service, where only pbs_mom is running under pbs.service cgroup.
Yes, jobs run under the cgroup of pbs_mom.service however to have freedom to manage our own cgroup for jobs we still need pbs_job.services.

I see having single unit for the daemons, we cannot utilize some of the systemd features like, auto restart a daemon on-failure and configuration reload for a daemon

I don’t agree. PBS daemons are so reliable nowadays that if a daemon dies, it’s almost always an issue with the host, not the daemon. In that case, I want the daemon to stay down until the host is fixed.

As to reloading the config, how hard is pkill -HUP -u root pbs_sched ?

We can still have one unit which can manage other daemon specific units as child

I Googled for some systemd mechanism to deal with multiple sub-units and did not find anything that looked right. What are you proposing? Everything I looked at required a different over-all unit file for each kind of host (server, comm, execution, no-op) [Yes, we actually have some hosts (front-ends) where all the PBS_START_ values are set to 0, but the pbs init script performs some checks to make sure the host is ready for pbs.]

As to detecting daemons going away: Aren’t you already running some other monitoring tool (e.g. nagios) that can check for daemons?

1 Like

I don’t agree. PBS daemons are so reliable nowadays that if a daemon dies, it’s almost always an issue with the host, not the daemon. In that case, I want the daemon to stay down until the host is fixed.

Systemd have the option to control the restart operation by specifying signals via these service directives RestartPreventExitStatus, RestartForceExitStatus.

I Googled for some systemd mechanism to deal with multiple sub-units and did not find anything that looked right. What are you proposing?

I am talking something similar to approach explained here

As to detecting daemons going away: Aren’t you already running some other monitoring tool (e.g. nagios) that can check for daemons?

No, I think we do not use any monitoring tool for the daemons.

Be sure to read the comments on that page for limitations of the scheme.

Overall, this sounds like a fun project. However, don’t be surprised if systemd cannot quite do what you want without changes to the daemon code. Systemd does not handle sequencing robustly. (E.g., wait until daemon 1 is up and running before starting daemon 2.) Fortunately, PBS does not have hard sequencing requirements.

I reiterate my suggestion that startup scripts pay attention to the PBS_START_ values in /etc/pbs.conf. This will reduce fallout from operator errors.

Thanks @dtalcott for all the feedback - perhaps we are overestimating the need and robustness (and capabilities) of systemd. Since the original need arose from a testing related issue - we will probably be able to find another way around that for now.

Meantime, we will table any changes to the service units and take it up further if we hear more feedback from the community in the future.

I’ve used systemd.service’s Type=notify, with systemd-notify --ready (or the C API equivalent sd_notify) for precisely this in the past. From the docs:

notify or dbus (the latter only in case the service provides a D-Bus interface) are the preferred options as they allow service program code to precisely schedule when to consider the service started up successfully and when to proceed with follow-up units.

There is also the old-venerable Type=forking, where systemd considers the daemon as “running” only when the parent process exits. From the docs:

The parent process is expected to exit when start-up is complete and all communication channels are set up. The child continues to run as the main service process, and the service manager will consider the unit started when the parent process exits. … systemd will proceed with starting follow-up units as soon as the parent process exits.