The problem of share PBS_HOME in failover

I use NFS to configurate PBS_HOME for pbs primary server and secondary server, when primary server takes over again, secondary server seems not to release the file lock, so the primary server can not work normally. After I stop secondary server, it can take over successfully.

If pacemaker + corosync can instead of primary server and secondary server totally?(Install two pbs server)

Here the concept is Active-Passive, the PBS_HOME is shared and will be attached to the active server automatically.

The shared PBS_HOME is a resource for pacemaker and pacemaker is responsible for mounting the shared storage(PBS_HOME) onto the active server.

How is the file lock of PBS_HOME handled between the primary server and the secondary server?

With pacemaker and corosync based failover, there is no file locking mechanism.
The PBS_HOME is attached to the active PBS Server.

In case of , PBS Pro legacy failover (Primary /Secondary) , the PBS_HOME is banking on file locking mechanism, the active server that has control on the file lock is controlling the cluster. The other server is just active and checking the heartbeat of the other server. If there are any issues with file locking, then both servers think the control the services ( split brain) and might corrupt the datastore.

How should I choose between PBS Pro legacy failover (Primary/Secondary) and pacemaker and corosync?

  • Legacy and third party (pacemaker/corosync) failover/HA solutions cannot be mixed.
  • You would need to use one of the solution
  • Most robust and stable solution is with pacemaker and corosync.

Thank you so much for your patient help, Mr adarsh. I am extremely grateful.