I have a query regarding how the name of the server host (running PBS_SERVER) must appear in the MOM config. The question is two-fold, and I just need a confirmation that I am not doing something stupid.
We run two extremely similar clusters (development and production), and we would like as few differences as possible in the node images. Thus my quest to limit (or eliminate) the server name in the MOM config.
By default, the server name seems to appear twice on the nodes. Once in /etc/pbs.conf (PBS_SERVER) and once in the “MOM level 1 config file” $PBS_HOME/mom_priv/config ($clienthost),
I have tried to remove the $clienthost from mom_priv/config, and that seems to work fine.
Is there a specific reason for having the settings in two places (I assume that mom_priv/config is generated upon first start of the mom - and “automagically” includes $clienthost.
The two similar clusters (devel and prod) obviously have different actual names for the server hosts (frontends). However, on each cluster the name “frontend” refers to the head node (PBS_SERVER), i.e. “frontend” resolves to the correct IP of the PBS_SERVER.
Apparently, it is OK to use simply “frontend” in /etc/pbs.conf, even if this is not the “true” name of the PBS_SERVER?
I am only asking this, because that was explicitly a problem for Torque (where we had to use the primary name of the frontend).
The admin guide writes that PBS_SERVER is:
"Name of the PBS server. Cannot be longer than 255 characters. If the short name of the server host
resolves to the correct IP address, you can use the short name for the value of the PBS_SERVER entry in pbs.conf. If only the FQDN of the server host resolves to the correct IP address, you must use the
FQDN for the value of PBS_SERVER."
It does not mention, if I can use any name, which resolves to the right address,
A) The PBS_SERVER variable in the /etc/pbs.conf is the way of telling the mom where the server resides. This must be resolvable to the server’s ip address. The clienthost entry is for allowing mom to accept connections from several hosts. In case of failover, the secondary server hostname is added automatically to the $clienthost parameter in the mom config. This used to also be used in the case of HPCBP mom.
So usually you dont have to add anything to the $clienthost. Also since the server ip is already added internally as the source from where connections are accepted by the mom, it does not need to be explicitly added.
B) Two different clusters with same alias for the server (which are really different machines in both cases) is okay. As long as the alias can resolve to the correct IP address of the server, and the pbs_server knows that the ip address is one of the various ip addresses as know by the server as well, then things will just work.
Thanks. That was what I needed to hear.
I dont really like/want MOM to mess around with $clienthost, as it could potentially end with a discrepancy between our devel and production clusters. We like to take snapshots (golden node style) on the devel side, and then deploy them to the production side - to ensure that all features have been tested. But if the devel-side MOM adds a “real” head node name, then that will not work on the production side.
I cannot read from your answer, if it will be OK to leave out $clienthost - and if so if it will not pop up again later.
Will it be OK for me to also use an “alias”, ie.:
where frontend again resolves to the correct IP of PBS_SERVER?
B) It is okay to remove the $clienthost altogether for usual cases. Also, it can be an alias since all it does it adds the ip-address that it resolves to, into moms internal tables so that connections are accepted from that host.
usually you want PBS_SERVER to be the same name everywhere, since many client commands will tack this onto numerical job IDs to get the full job ID. You definitely want them to come up with the same name that the server uses for the jobs.
If you don’t want to add $clienthost lines, then PBS_SERVER must resolve to the IP address that the server used to register to pbs_comm (which by default is the IP address the server’s hostname resolves to – you can change that with PBS_LEAF_NAME on the server but that’s another can of worms since then you’ll also have to tell the scheduler to accept that address with its own $clienthost line in a config file).
The corollary of 1) plus 2) is of course that if you don’t want to use PBS_LEAF_NAME or $clienthost, PBS_SERVER actually needs to be the same name and resolve to the same IP address on the server and compute nodes.
Note that MoM contacts the server through pbs_comm, and you can actually set PBS_LEAF_ROUTERS to whatever you want and e.g. even if the PBS_SERVER IP address is not directly reachable pbs_comm will obviously know what to do (since it usually sits on the server node and can see all the local addresses as local).
But of course if you plan to also run client commands on the compute nodes you may want a static route to the PBS_SERVER address through whatever directly reachable IP address for the client commands so that the client commands know where to throw their packets (letting them use the “default gateway” is usually fraught with dangers since the packets may travel the globe and even cross some NAT gateways that will break things).
For our case, that will be true. The idea is that we have two different clusters (for development and production), where we want as few differences as absolutely possible. Thus, frontend will resolve to the correct IP of PBS_SERVER on both frontend and nodes, although the actual “true” name of the server is not actually frontend. By using frontend, the config files do not have to differ between development and production, so we can use a single node image for both clusters (with a very few post-install modifications).
Presently, we actually run with nodes naming PBS_SERVER=frontend, while the serverside uses the “true” name. But this is only to get the true host name (rather than frontend) in the pbs server-side logs.
Presently, we only submit jobs from the cluster frontend (i.e. on the PBS_SERVER), which will use the actual hostname (hostname -s) for PBS_SERVER. The nodes use frontend as an alias.
So far it works, but I will keep this issue in mind if things start falling apart.