Jobs stuck in queue. Scheduler connecting issues

Hello,
I am getting the following message in the scheduler logs for a cluster running openpbs23 and warewulf3 cluster manager.

pbs_sched;Svr;pbs_sched;Access from host not allowed, or unknown host (15008) in open_server_conns, Couldn’t register the scheduler default with connected server

Jobs can be submitted but get stuck in the queue and never run. Due to the above issue I presume.

Any suggestions on things to check? Firewall is already off, selinux disabled, ssh connectivity is working correctly.

The address the TCP packets from the scheduler seem to come from do not match the scheduler’s sched_host attribute (cfr. qmgr -c “print sched @default” output.) The server logs will tell you where the server thinks the packets are coming from (I.e. what to set sched_host to).

One possible way to solve it if you can’t figure it out is to set sched_host to localhost and to add PBS_SERVER_HOST_NAME=localhost to the line on which you start pbs_sched (that will make the scheduler contact 127.0.0.1, which will also make the source IP address for the packets match that address.

1 Like