Sharing all of /var/spool/pbs on HA server hosts

We are planning on running PBS in an HA configuration. On a small testbed, we mounted /var/spool/pbs on the two machines that were going to be servers. In the case of the testbed, they were also going to be MoMs. This caused problems because of a lock file in mom_priv. My question is, is there any similar concern for server_priv? In production the server nodes will never be computes, so that isn’t an issue, but during fail-back both servers will be running and I was wondering if there could be any issues with locking or file corruption because both servers were trying to use the same file? It would just be a LOT easier if we can mount the entire directory rather than carving up specific sub-directories.

PBS in an HA configuration

  • requires $PBS_HOME hosted on a storage server which has file locking enabled (e.g. NFS file locking )

If you are using MOM: use the below attribute in the /etc/pbs.conf of the compute nodes

PBS_MOM_HOME Path - Location of mom_priv on each host; overrides PBS_HOME for
mom_priv

Thank you for the information. PBS_MOM_HOME solves the problem on our testbed. I went to the reference guide and checked all the other possible entries and did not see any equivalent entries for the server. Can you confirm that having a shared $PBS_HOME/server_priv is not an issue? Under normal circumstance I would not expect it to be, but if the secondary is running and the primary comes back up, there is a period of time when both are running so I wanted to make sure there would be no conflict.

Its not a problem.

[Primary Server ~]$ cat /etc/pbs.conf
PBS_EXEC=/shared/pbspro/2020
PBS_HOME=/shared/pbspro/var/spool/pbspro
PBS_START_SERVER=1
PBS_START_SCHED=1
PBS_SERVER=pbsprimary
PBS_START_MOM=0
PBS_START_COMM=1
PBS_RCP=/bin/false
PBS_SCP=/bin/scp
PBS_RSHCOMMAND=/bin/ssh
PBS_PRIMARY=pbsprimary
PBS_SECONDARY=pbssecondary

[Secondary Server ~]$ cat /etc/pbs.conf
PBS_EXEC=/shared/pbspro/2020
PBS_HOME=/shared/pbspro/var/spool/pbspro
PBS_START_SERVER=1
PBS_START_SCHED=0
PBS_SERVER=pbsprimary
PBS_START_MOM=0
PBS_START_COMM=1
PBS_RCP=/bin/false
PBS_SCP=/bin/scp
PBS_RSHCOMMAND=/bin/ssh
PBS_PRIMARY=pbsprimary
PBS_SECONDARY=pbssecondary

[ Linux Mom ]$
PBS_EXEC=/shared/pbspro/2020
PBS_HOME=/shared/pbspro/var/spool/pbspro
PBS_START_SERVER=0
PBS_START_SCHED=0
PBS_SERVER=pbsprimary
PBS_START_MOM=1
PBS_MOM_HOME=/local/pbspro/var/spool
PBS_START_COMM=0
PBS_RCP=/bin/false
PBS_SCP=/bin/scp
PBS_RSHCOMMAND=/bin/ssh
PBS_PRIMARY=pbsprimary
PBS_SECONDARY=pbssecondary

That is perfect. Thank you very much for your help.

1 Like

Hi, I’m trying to add PBS_MOM_HOME in pbs.conf to override PBS_HOME on an execution node, pbs.conf output is as follows:

cat /etc/pbs.conf
PBS_EXEC=/opt/pbs
PBS_SERVER=master
PBS_START_SERVER=0
PBS_START_SCHED=0
PBS_START_COMM=0
PBS_START_MOM=1
PBS_HOME=/var/spool/pbs
PBS_MOM_HOME=/home/pbs
PBS_CORE_LIMIT=unlimited
PBS_SCP=/bin/scp

Question:
However, an error is reported when systemctl start PBS as follows:

grep: /home/pbs/mom_ priv/config: No such file

pbs_ mom: Unable to open logfile and so on.

The main meaning is that there is no pbs home directory specified by PBS_MOM_HOME, then causes a series of errors,

How set PBS_MOM_HOME customize the home directory of each mom? For example, I want to put the home directory of all computing nodes into an NFS shared storage, and distinguish different mom’s home directory by their PBS_MOM_ HOME

In addition:
I modify PBS_HOME in pbs.conf which is ok When pbs starts, pbs will automatically create a new pbs home directory by PBS_HOME, and synchronize some settings from pbs.conf (such as mom_priv/config: $clienthost master from pbs.ocnf:PBS_SERVER=master)