Thank you for sharing the logs.
- make sure ports 15001 to 15007 , 17001 are not blocked and SELinux disabled & system is rebooted after disabling SELinux.
- make sure the /etc/hosts is populated correctly ( DNS is properly configured for forward and reverse address resolution) on the headnode and across all the compute nodes
03/18/2019 08:56:45;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection: failed: could not connect to server: Connection refused
Is the server running on host “192.168.1.49” and accepting
TCP/IP connections on port 15007?]
FYI: https://blog.bigbinary.com/2016/01/23/configure-postgresql-to-allow-remote-connection.html
-
if you check the scheduler logs, you see there are messages regarding system ulimits. Increase the ulimits at the system level or you can populate the system limits in /opt/pbs/lib/init.d/limits.pbs_mom & /opt/pbs/lib/init.d/limits.pbs_server .
-
Check the server logs ( scheduler is down, server is contacting the scheduler here)
URL: https://pbspro.atlassian.net/browse/PP-1083