I have a cluster of around 30 compute nodes running well for the past year. Everything is on centos 7. Today I tried to add a new node that is running RedHat 8. I successfully installed the PBS execution rpm on it. Then I went back to the head node to add the node to it via these commands:
qmgr -c “create node delltower48”
qmgr -c “c q delltower48q queue_type=e,enabled=t, started=t”
qmgr -c “set node delltower48 queue=delltower48”
After running these commands PBS seemed to have stopped working. All the pbs commands hang. The output of “strace qstat” is below. I have tried restarting PBS but the same issue occurs.
connect(3, {sa_family=AF_INET, sin_port=htons(15001), sin_addr=inet_addr(“10.194.181.144”)}, 16) = -1 ECONNREFUSED (Connection refused)
close(3) = 0
dup(2) = 3
fcntl(3, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE)
fstat(3, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2), …}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f233c297000
write(3, “Connection refused\n”, 19Connection refused
) = 19
close(3) = 0
munmap(0x7f233c297000, 4096) = 0
write(2, "qstat: cannot connect to server "…, 54qstat: cannot connect to server lustwzb34 (errno=111)
) = 54
exit_group(-1) = ?
+++ exited with 255 +++
[255]root@lustwzb34:~#