When I try qsub command, it returns
Communication failure.
qstat: cannot connect to server songyi7 (errno=15031)
I found topic about similar problem(Communication Failure - #3 by baugarcia) and followed the same, but didn’t work well.
My client and server is on same PC, and ping is normal
PING songyi719-ThinkPad-X1-Extreme-2nd (110.76.77.137) 56(84) bytes of data.
64 bytes from songyi719-ThinkPad-X1-Extreme-2nd (110.76.77.137): icmp_seq=1 ttl=64 time=0.019 ms
64 bytes from songyi719-ThinkPad-X1-Extreme-2nd (110.76.77.137): icmp_seq=2 ttl=64 time=0.018 ms
64 bytes from songyi719-ThinkPad-X1-Extreme-2nd (110.76.77.137): icmp_seq=3 ttl=64 time=0.016 ms
64 bytes from songyi719-ThinkPad-X1-Extreme-2nd (110.76.77.137): icmp_seq=4 ttl=64 time=0.017 ms
64 bytes from songyi719-ThinkPad-X1-Extreme-2nd (110.76.77.137): icmp_seq=5 ttl=64 time=0.018 ms
64 bytes from songyi719-ThinkPad-X1-Extreme-2nd (110.76.77.137): icmp_seq=6 ttl=64 time=0.016 ms
64 bytes from songyi719-ThinkPad-X1-Extreme-2nd (110.76.77.137): icmp_seq=7 ttl=64 time=0.016 ms
64 bytes from songyi719-ThinkPad-X1-Extreme-2nd (110.76.77.137): icmp_seq=8 ttl=64 time=0.016 ms
64 bytes from songyi719-ThinkPad-X1-Extreme-2nd (110.76.77.137): icmp_seq=9 ttl=64 time=0.017 ms
^C
— songyi719-ThinkPad-X1-Extreme-2nd ping statistics —
9 packets transmitted, 9 received, 0% packet loss, time 8188ms
rtt min/avg/max/mdev = 0.016/0.017/0.019/0.001 ms
songyi719@songyi719-thinkpad-x1-extreme-2nd:~/Desktop/
However, when I tried qstat -Bf, it also returned
Communication failure.
qstat: cannot connect to server songyi7 (errno=15031)
What can be the solution?
Login to songyi7
ps ax | grep pbs
You should see the PBS server and scheduler running. If not start them.
Without the PBS server running, running qstat on a node would give that error.
Mike
ps ax | grep pbs returns
6990 ? Ssl 0:00 /opt/pbs/sbin/pbs_comm
7578 ? Ssl 0:00 /opt/pbs/sbin/pbs_mom
7590 ? Ssl 0:00 /opt/pbs/sbin/pbs_sched
7731 ? Ss 0:00 /opt/pbs/sbin/pbs_ds_monitor monitor
7743 ? Ss 0:00 /usr/lib/postgresql/12/bin/postgres -D /var/spool/pbs/datastore -p 15007
7764 ? Ss 0:00 postgres: postgres pbs_datastore 110.76.77.137(59854) idle
7765 ? Ssl 0:00 /opt/pbs/sbin/pbs_server.bin
10387 pts/0 S+ 0:00 grep --color=auto pbs
pbs server is running, and It can be checked by /etc/init.d/pbs status return
pbs_server is pid 7765
pbs_mom is pid 7578
pbs_sched is pid 7590
pbs_comm is 6990
Good. Next check if you have a firewall running which could be blocking those ports.
return value of iptables -S is
-P INPUT ACCEPT
-P FORWARD ACCEPT
-P OUTPUT ACCEPT
-A INPUT -p tcp -m tcp --dport 15001 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 15002 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 15003 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 15004 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 15005 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 15006 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 15007 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 15008 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 15009 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 17001 -j ACCEPT
so port is all opened btw 15001~9, 17001
return value of ufw status is
Status: inactive
So firewall is shutted down
-
PBS_EXEC=/opt/pbs
PBS_SERVER=songyi719-ThinkPad-X1-Extreme-2nd
PBS_START_SERVER=1
PBS_START_SCHED=1
PBS_START_COMM=1
PBS_START_MOM=1
PBS_HOME=/var/spool/pbs
PBS_CORE_LIMIT=unlimited
PBS_SCP=/usr/bin/scp
-
127.0.0.1 localhost
110.76.77.137 songyi719-ThinkPad-X1-Extreme-2nd pbs
#127.0.1.1 songyi719
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
-
aliases: pbs
address length: 4 bytes
address: 110.76.77.137 (2303544430 dec) name: songyi719-ThinkPad-X1-Extreme-2nd
-
ping is checked to be stable, and since client and server is located at same pc, so I think communication may not be the problem
-
My IP address is manually set, so I believe it is static IP.
-
Trying 110.76.77.137…
telnet: Unable to connect to remote host: Connection refused
First test if pbs_iff is working.
source ./etc/pbs.conf
$PBS_EXEC/sbin/pbs_iff -t $PBS_SERVER 15001
If it doesn’t work, one of the reasons is that after “make install” you did not change the mode of $PBS_EXEC/sbin/pbs_iff to 04755. Failure to set the setuid bit on that binary means that the default “resvport” authentication will not work, given that the pbs_iff binary called by the client will not be able to grab a reserved port to vouch for the client connection.
Another problem that may occur is that there is another daemon that grabs all reserved ports – we’ve seen things like the NIS client do that unless there is some kind of caching of username/UID mappngi, to give just one example. But there again, you should be able to see if pbs_iff works.