PBS Failover with Pacemaker and Corosync


As we move to multi-server support for PBS, the current active-passive failover support we have would make the setup more complex as we increase the number of server instances in the complex.

Tools like Pacemaker + Corosync have great community support and have much better configuration options than what we have today with PBS. So I am proposing to remove failover support within PBS and recommend the use of tools like Pacemaker and Corosync.

I have written a design page with instructions on how to set up a HA cluster with Pacemaker + Corosync.


There is also a small demo available showing how the pacemaker works with PBS: