Cannot connect server and moms Ubuntu 24

Hi,
I am experiencing some troubles with connecting two moms (gecko and fox) to server (hlavicka). Does anybody have any suggestions what is wrong?

All of these 3 hosts are on public IPs (x.x.x.x) and are interconnected by crossed wires (for NFS). This is what hosts files look like on hlavicka, moms are the same just with different local IPs for pairwise connections and same global IPs:

127.0.0.1 localhost
192.168.3.6 gecko36
192.168.2.3 fox23
x.x.x.x1 hlavicka
x.x.x.x2 gecko
x.x.x.x3 fox

Pbs.conf on server (mom on hlavicka is on because I am trying various things…):

PBS_SERVER=hlavicka
PBS_START_SERVER=1
PBS_START_SCHED=1
PBS_START_COMM=1
PBS_START_MOM=1
PBS_EXEC=/opt/pbs
PBS_HOME=/var/spool/pbs
PBS_CORE_LIMIT=unlimited
PBS_SCP=/usr/bin/scp

pbs.conf same on both moms

PBS_SERVER=hlavicka
PBS_START_SERVER=0
PBS_START_SCHED=0
PBS_START_COMM=0
PBS_START_MOM=1
PBS_EXEC=/opt/pbs
PBS_HOME=/var/spool/pbs
PBS_CORE_LIMIT=unlimited
PBS_SCP=/usr/bin/scp

The problem is their connection. All daemons are running but I got this in logs from server:

02/07/2025 09:25:29;0001;Server@hlavicka;Svr;Server@hlavicka;is_request, bad attempt to connect from x.x.x.x2:15003, reason=tfind2:pmom
02/07/2025 09:25:32;0001;Server@hlavicka;Svr;Server@hlavicka;is_request, bad attempt to connect from x.x.x.x3:15003, reason=tfind2:pmom

and logs from moms:

02/07/2025 09:25:29;0002;pbs_mom;Svr;pbs_mom;HELLO sent to server at hlavicka:15001, stream:134
02/07/2025 09:25:29;0001;pbs_mom;Svr;pbs_mom;im_eof, Premature end of message from addr x.x.x.x1:15001 on stream 134
02/07/2025 09:25:29;0002;pbs_mom;Svr;im_eof;Server closed connection.

Please make sure ports 15001 - 15009 , 17001 are not blocked between the nodes of the cluster.
You can try adding PBS_LEAF_NAME to the /etc/pbs.conf.

Ref: Set or not set PBS_LEAF_NAME parameter - #8 by adarsh

I don’t know why, I was now able to establish connection. I was either using wrong " symbol for qmgr (“ instead of ") or something just randomly happened. What a terrible week, I was losing my mind. After opening other ports, as specified in topic: Interactive Job errors out with 'apparently deleted' - #14 by ndusek I was finally able to run pbs again on my cluster. Thank you adarsh!!!

1 Like