Qstat,qmgr,pbsnodes hanging after adding in new compute node

I have a cluster of around 30 compute nodes running well for the past year. Everything is on centos 7. Today I tried to add a new node that is running RedHat 8. I successfully installed the PBS execution rpm on it. Then I went back to the head node to add the node to it via these commands:

qmgr -c “create node delltower48”
qmgr -c “c q delltower48q queue_type=e,enabled=t, started=t”
qmgr -c “set node delltower48 queue=delltower48”

After running these commands PBS seemed to have stopped working. All the pbs commands hang. The output of “strace qstat” is below. I have tried restarting PBS but the same issue occurs.

connect(3, {sa_family=AF_INET, sin_port=htons(15001), sin_addr=inet_addr(“10.194.181.144”)}, 16) = -1 ECONNREFUSED (Connection refused)
close(3) = 0
dup(2) = 3
fcntl(3, F_GETFL) = 0x8002 (flags O_RDWR|O_LARGEFILE)
fstat(3, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2), …}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f233c297000
write(3, “Connection refused\n”, 19Connection refused
) = 19
close(3) = 0
munmap(0x7f233c297000, 4096) = 0
write(2, "qstat: cannot connect to server "…, 54qstat: cannot connect to server lustwzb34 (errno=111)
) = 54
exit_group(-1) = ?
+++ exited with 255 +++
[255]root@lustwzb34:~#

I seemed to have got it working by stopping the pbs service on the Redhat 8 node. Then I went back to the head node and restarted pbs. So is it not possible to mix’n’match OS’s with PBS?

I installed this RPM on the Redhat 8 node:

openpbs-execution-20.0.1-0.x86_64.rpm

All other nodes on my cluster I installed:

pbspro-execution-18.1.2-0.x86_64.rpm

You can mix and match the operating systems with PBS Professional but make sure all the systems (server/scheduler/mom) in the cluster run the same version of PBS .

Thanks, makes sense. Is there a version of PBS compatible with both Centos 7 and RedHat 8? It seems according to here:

There is only a Centos 8 version for 20.0.1. I suppose I can try installing it on a test node with Centos 7 and see what happens…

1 Like

Please build from source for CentOS 7 , while you have the rpm ready for CentOS 8.

Do you think it is possible to build 18.1.2 from source on the RedHat 8.3 machine? I would prefer this route rather than building 20.0.1 from source on all my thirty cluster machines…

Another suggestion is for PBS to give some warning when it detects a version mismatch on one of the nodes. It was quite scary to see the whole scheduler just hang due to adding this node in. Luckily I am not running anything “mission critical”.