Configure submission hosts for multiple servers


Currently, we have two independent clusters, say cluster A and cluster B served by PBS servers, serverA and serverB.

Also, we have submission hosts that are currently configured to submit only to their cluster; i.e. submitHostA can submit to serverA and similarly, submitHostB to serverB.

My question is, how can I configure submitHostA to submit jobs both to serverA as well as serverB? Also, any caveats or dangers to this architecture?

What we did is set up routing queues on submitHostA that just forward jobs to matching queues on serverB.

As to caveats, etc, this setup works best if serverA and serverB share accounts, home directories (e.g., via NFS), and other important file systems. Also, it helps if serverB trusts submitHostA (in addition to serverA). That way, a user on submitHostA can run various PBS commands (qstat, qdel, …) using the @serverB syntax. (E.g., qstat @serverB, qdel 1234.serverA@serverB)

Further, if a submitHostA user knows they are going to be interacting only with serverB for a while, they can set PBS_DEFAULT=serverB in their environment and all PBS commands will have an implicit @serverB.

1 Like

I personally prefer to set PBS_SERVER environment right before the call to the qsub/qstat. Be warned that v20+ qcmds might be compaitable with old servers (v19 or less)

So attempting the basic qstat of the remote server and receiving:

$ qstat -B $PBS_SERVER

No Permission.

qstat: cannot connect to server (errno=15007)

Any specific perms needing to be granted on the remote server side?

Hi @adarsh - Say, any help you can provide on the above issue?

Please check the server logs, when you run the above command. .
It might be related to acl_users set in the server configuration.

Hi @adarsh

We don’t have server acls for hosts/users enabled. Also, I see no errors in server logs on either server.

Also, here is the error we see when we attempt from the OpenPBS 2021 server (bright03-ib) to the PBS Pro 18.1.4 server (bright01-ib):

qstat -B bright01-ib

auth: error returned: -1

auth: Unable to authenticate connection (bright01-ib:15001)

qstat: cannot connect to server bright01-ib (errno=-1)

Any additional troubleshooting steps to try?