Communication failure? (errno=15031)

When I try qsub command, it returns

Communication failure.
qstat: cannot connect to server songyi7 (errno=15031)

I found topic about similar problem(Communication Failure - #3 by baugarcia) and followed the same, but didn’t work well.

My client and server is on same PC, and ping is normal

PING songyi719-ThinkPad-X1-Extreme-2nd (110.76.77.137) 56(84) bytes of data.
64 bytes from songyi719-ThinkPad-X1-Extreme-2nd (110.76.77.137): icmp_seq=1 ttl=64 time=0.019 ms
64 bytes from songyi719-ThinkPad-X1-Extreme-2nd (110.76.77.137): icmp_seq=2 ttl=64 time=0.018 ms
64 bytes from songyi719-ThinkPad-X1-Extreme-2nd (110.76.77.137): icmp_seq=3 ttl=64 time=0.016 ms
64 bytes from songyi719-ThinkPad-X1-Extreme-2nd (110.76.77.137): icmp_seq=4 ttl=64 time=0.017 ms
64 bytes from songyi719-ThinkPad-X1-Extreme-2nd (110.76.77.137): icmp_seq=5 ttl=64 time=0.018 ms
64 bytes from songyi719-ThinkPad-X1-Extreme-2nd (110.76.77.137): icmp_seq=6 ttl=64 time=0.016 ms
64 bytes from songyi719-ThinkPad-X1-Extreme-2nd (110.76.77.137): icmp_seq=7 ttl=64 time=0.016 ms
64 bytes from songyi719-ThinkPad-X1-Extreme-2nd (110.76.77.137): icmp_seq=8 ttl=64 time=0.016 ms
64 bytes from songyi719-ThinkPad-X1-Extreme-2nd (110.76.77.137): icmp_seq=9 ttl=64 time=0.017 ms
^C
— songyi719-ThinkPad-X1-Extreme-2nd ping statistics —
9 packets transmitted, 9 received, 0% packet loss, time 8188ms
rtt min/avg/max/mdev = 0.016/0.017/0.019/0.001 ms
songyi719@songyi719-thinkpad-x1-extreme-2nd:~/Desktop/

However, when I tried qstat -Bf, it also returned

Communication failure.
qstat: cannot connect to server songyi7 (errno=15031)

What can be the solution?

Login to songyi7
ps ax | grep pbs
You should see the PBS server and scheduler running. If not start them.
Without the PBS server running, running qstat on a node would give that error.
Mike

ps ax | grep pbs returns

6990 ? Ssl 0:00 /opt/pbs/sbin/pbs_comm
7578 ? Ssl 0:00 /opt/pbs/sbin/pbs_mom
7590 ? Ssl 0:00 /opt/pbs/sbin/pbs_sched
7731 ? Ss 0:00 /opt/pbs/sbin/pbs_ds_monitor monitor
7743 ? Ss 0:00 /usr/lib/postgresql/12/bin/postgres -D /var/spool/pbs/datastore -p 15007
7764 ? Ss 0:00 postgres: postgres pbs_datastore 110.76.77.137(59854) idle
7765 ? Ssl 0:00 /opt/pbs/sbin/pbs_server.bin
10387 pts/0 S+ 0:00 grep --color=auto pbs

pbs server is running, and It can be checked by /etc/init.d/pbs status return

pbs_server is pid 7765
pbs_mom is pid 7578
pbs_sched is pid 7590
pbs_comm is 6990

Good. Next check if you have a firewall running which could be blocking those ports.

return value of iptables -S is

-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
-A INPUT -p tcp -m tcp --dport 15001 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 15002 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 15003 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 15004 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 15005 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 15006 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 15007 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 15008 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 15009 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 17001 -j ACCEPT

so port is all opened btw 15001~9, 17001

return value of ufw status is

Status: inactive

So firewall is shutted down

Please share contents of

  1. /etc/pbs.conf
  2. /etc/hosts
  3. source /etc/pbs.conf ; pbs_hostn -v $PBS_SERVER
  4. please make sure the communication is not disrupted between the communicating daemons
  5. Also, the IP address is not dynamic , please make sure it is static IP address.
  6. source /etc/pbs.conf ; telnet $PBS_SERVER 15031 # whether this works

PBS_EXEC=/opt/pbs
PBS_SERVER=songyi719-ThinkPad-X1-Extreme-2nd
PBS_START_SERVER=1
PBS_START_SCHED=1
PBS_START_COMM=1
PBS_START_MOM=1
PBS_HOME=/var/spool/pbs
PBS_CORE_LIMIT=unlimited
PBS_SCP=/usr/bin/scp

127.0.0.1 localhost
110.76.77.137 songyi719-ThinkPad-X1-Extreme-2nd pbs
#127.0.1.1 songyi719

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

aliases: pbs
address length: 4 bytes
address: 110.76.77.137 (2303544430 dec) name: songyi719-ThinkPad-X1-Extreme-2nd

ping is checked to be stable, and since client and server is located at same pc, so I think communication may not be the problem

Screenshot from 2021-02-02 10-54-51
My IP address is manually set, so I believe it is static IP.

Trying 110.76.77.137…
telnet: Unable to connect to remote host: Connection refused

First test if pbs_iff is working.

source ./etc/pbs.conf

$PBS_EXEC/sbin/pbs_iff -t $PBS_SERVER 15001

If it doesn’t work, one of the reasons is that after “make install” you did not change the mode of $PBS_EXEC/sbin/pbs_iff to 04755. Failure to set the setuid bit on that binary means that the default “resvport” authentication will not work, given that the pbs_iff binary called by the client will not be able to grab a reserved port to vouch for the client connection.

Another problem that may occur is that there is another daemon that grabs all reserved ports – we’ve seen things like the NIS client do that unless there is some kind of caching of username/UID mappngi, to give just one example. But there again, you should be able to see if pbs_iff works.