Pbs_server is not running

Hi,

/etc/init.d/pbs status return :

pbs_server is not running
pbs_mom is not running
pbs_sched is not running
pbs_comm is not running

I have stopped and restarted server with /etc/init.d/pbs stop | start command but I have the same message. When I submit a job it is stay queued.

Can you please tell me what is the problem ?

Thank a lot for your help

Can you please post the output that you see when you run ā€œ/etc/init.d/pbs startā€ ?

If the job status is queued then server service is up and running.

Please share the output of

  1. qstat -fx
  2. pbsnodes -av
  3. qstat -answ1

If you cannot run these commands, then share the server and scheduler logs.

I have a failed

Starting PBS
/opt/pbs/sbin/pbs_comm ready (pid=10630), Proxy Name:centos7-1.home:17001, Threads:4
PBS comm
PBS mom
Creating usage database for fairshare.
PBS sched
Connecting to PBS dataserviceā€¦Failed to start PBS dataservice
.Failed to start PBS dataservice
ā€¦Failed to start PBS dataservice
continuing in background.
PBS server

  1. qstat -fx

Connection refused
qstat: cannot connect to server centos7-1 (errno=111)

  1. pbsnodes -av

Connection refused
pbsnodes: cannot connect to server centos7-1, error=111

  1. qstat -answ1

Connection refused
pbsnodes: cannot connect to server centos7-1, error=111

Can you tell me where is the path of server and scheduler logs please ?

Thank you

as root user, please follow the below steps

  1. source /etc/pbs.conf
  2. cd $PBS_HOME/server_logs
  3. cd $PBS_HOME/sched_logs
  1. source /etc/pbs.conf

return nothing

server_logs

03/18/2019 00:00:00;0002;Server@centos7-1;Svr;Log;Log opened
03/18/2019 00:00:00;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_version=19.0.0
03/18/2019 00:00:00;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_build=mach=N/A:security=N/A:configure_args=N/A
03/18/2019 00:00:00;0002;Server@centos7-1;Svr;Server@centos7-1;hostname=centos7-1.home;pbs_leaf_name=N/A;pbs_mom_node_name=N/A
03/18/2019 00:00:00;0002;Server@centos7-1;Svr;Server@centos7-1;ipv4 interface lo: localhost4.localdomain4 
03/18/2019 00:00:00;0002;Server@centos7-1;Svr;Server@centos7-1;ipv4 interface enp9s0: centos7-1.home 
03/18/2019 00:00:00;0002;Server@centos7-1;Svr;Server@centos7-1;ipv6 interface lo: localhost6.localdomain6 
03/18/2019 00:00:00;0002;Server@centos7-1;Svr;Server@centos7-1;ipv6 interface enp9s0: centos7-1.home 
03/18/2019 00:00:00;0002;Server@centos7-1;Svr;Server@centos7-1;ipv6 interface enp9s0: centos7-1.home 
03/18/2019 00:00:00;0002;Server@centos7-1;Svr;Act;Account file /var/spool/pbs/server_priv/accounting/20190318 opened
03/18/2019 00:00:00;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:02;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:04;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:06;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:08;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:10;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:12;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:14;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:16;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:18;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:20;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:22;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:24;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:26;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:28;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:30;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:32;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:34;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:36;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:38;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:40;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:42;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:44;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 00:00:46;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
[.....]
03/18/2019 08:55:51;0100;Server@centos7-1;Req;;Type 0 request received from nekcorp@centos7-1.home, sock=14
03/18/2019 08:55:51;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 08:55:51;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 08:55:51;0100;Server@centos7-1;Req;;Type 49 request received from nekcorp@centos7-1.home, sock=15
03/18/2019 08:55:51;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 08:55:51;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 08:55:51;0100;Server@centos7-1;Req;;Type 21 request received from nekcorp@centos7-1.home, sock=14
03/18/2019 08:55:51;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 08:55:51;0100;Server@centos7-1;Req;;Type 19 request received from nekcorp@centos7-1.home, sock=14
03/18/2019 08:55:51;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 08:55:51;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 08:55:53;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 08:55:55;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 08:55:57;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 08:55:59;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 08:56:01;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 08:56:03;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 08:56:05;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 08:56:07;0001;Server@centos7-1;Svr;Server@centos7-1;Operation now in progress (115) in contact_sched, Could not contact Scheduler
03/18/2019 08:56:09;0040;Server@centos7-1;Svr;centos7-1;Scheduler sent command 10
03/18/2019 08:56:09;0040;Server@centos7-1;Svr;centos7-1;Scheduler sent command 0
03/18/2019 08:56:09;0100;Server@centos7-1;Req;;Type 81 request received from Scheduler@centos7-1.home, sock=14
03/18/2019 08:56:09;0100;Server@centos7-1;Req;;Type 9 request received from Scheduler@centos7-1.home, sock=14
03/18/2019 08:56:09;0004;Server@centos7-1;Sched;Server@centos7-1;attributes set:  at request of Scheduler@centos7-1.home
03/18/2019 08:56:09;0004;Server@centos7-1;Sched;Server@centos7-1;attributes set: sched_host = centos7-1.home
03/18/2019 08:56:09;0004;Server@centos7-1;Sched;Server@centos7-1;attributes set: sched_port = 15004
03/18/2019 08:56:09;0004;Server@centos7-1;Sched;Server@centos7-1;attributes set: pbs_version = 19.0.0
03/18/2019 08:56:09;0100;Server@centos7-1;Req;;Type 82 request received from Scheduler@centos7-1.home, sock=14
03/18/2019 08:56:09;0100;Server@centos7-1;Req;;Type 21 request received from Scheduler@centos7-1.home, sock=14
03/18/2019 08:56:09;0100;Server@centos7-1;Req;;Type 81 request received from Scheduler@centos7-1.home, sock=14
03/18/2019 08:56:09;0100;Server@centos7-1;Req;;Type 71 request received from Scheduler@centos7-1.home, sock=14
03/18/2019 08:56:09;0100;Server@centos7-1;Req;;Type 58 request received from Scheduler@centos7-1.home, sock=14
03/18/2019 08:56:09;0100;Server@centos7-1;Req;;Type 20 request received from Scheduler@centos7-1.home, sock=14
03/18/2019 08:56:09;0100;Server@centos7-1;Req;;Type 51 request received from Scheduler@centos7-1.home, sock=14
03/18/2019 08:56:09;0100;Server@centos7-1;Req;;Type 51 request received from Scheduler@centos7-1.home, sock=14
03/18/2019 08:56:09;0100;Server@centos7-1;Req;;Type 51 request received from Scheduler@centos7-1.home, sock=14
03/18/2019 08:56:09;0100;Server@centos7-1;Req;;Type 11 request received from Scheduler@centos7-1.home, sock=14
03/18/2019 08:56:09;0008;Server@centos7-1;Job;77.centos7-1;Job Modified at request of Scheduler@centos7-1.home
03/18/2019 08:56:15;0c06;Server@centos7-1;TPP;Server@centos7-1(Thread 0);Registering address 192.168.1.49:15001 to pbs_comm
03/18/2019 08:56:15;0c06;Server@centos7-1;TPP;Server@centos7-1(Thread 0);Connected to pbs_comm centos7-1:17001
03/18/2019 08:56:15;0d80;Server@centos7-1;TPP;Server@centos7-1(Main Thread);net restore handler called
03/18/2019 08:56:15;0002;Server@centos7-1;Node;centos7-1.home;update2 state:0 ncpus:2
03/18/2019 08:56:15;0002;Server@centos7-1;Node;centos7-1.home;Mom reporting 1 vnodes as of Mon Mar 18 08:56:08 2019
03/18/2019 08:56:15;0002;Server@centos7-1;Node;centos7-1.home;node up
03/18/2019 08:56:15;0080;Server@centos7-1;Req;Server@centos7-1;successfully sent hook file /var/spool/pbs/server_priv/hooks/PBS_power.HK to centos7-1.home:15002
03/18/2019 08:56:23;0100;Server@centos7-1;Req;;Type 0 request received from root@centos7-1.home, sock=14
03/18/2019 08:56:23;0100;Server@centos7-1;Req;;Type 49 request received from root@centos7-1.home, sock=15
03/18/2019 08:56:23;0100;Server@centos7-1;Req;;Type 21 request received from root@centos7-1.home, sock=14
03/18/2019 08:56:24;0100;Server@centos7-1;Req;;Type 0 request received from root@centos7-1.home, sock=14
03/18/2019 08:56:24;0100;Server@centos7-1;Req;;Type 49 request received from root@centos7-1.home, sock=15
03/18/2019 08:56:24;0100;Server@centos7-1;Req;;Type 17 request received from root@centos7-1.home, sock=14
03/18/2019 08:56:24;0086;Server@centos7-1;Svr;Server@centos7-1;Shutdown request from root@centos7-1.home 
03/18/2019 08:56:24;0086;Server@centos7-1;Svr;Server@centos7-1;Starting to shutdown the server, type is Quick
03/18/2019 08:56:24;0002;Server@centos7-1;Svr;Server@centos7-1;Stopping PBS dataservice
03/18/2019 08:56:28;0100;Server@centos7-1;Svr;Server@centos7-1;--> Stopping Python interpreter <--
03/18/2019 08:56:28;0d80;Server@centos7-1;TPP;Server@centos7-1(Main Thread);Shutting down TPP transport Layer
03/18/2019 08:56:28;0d80;Server@centos7-1;TPP;Server@centos7-1(Thread 0);Thrd exiting, had 1 connections
03/18/2019 08:56:28;0002;Server@centos7-1;Svr;Server@centos7-1;Server shutdown completed
03/18/2019 08:56:28;0002;Server@centos7-1;Svr;Log;Log closed
03/18/2019 08:56:33;0002;Server@centos7-1;Svr;Log;Log opened
03/18/2019 08:56:33;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_version=19.0.0
03/18/2019 08:56:33;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_build=mach=N/A:security=N/A:configure_args=N/A
03/18/2019 08:56:33;0002;Server@centos7-1;Svr;Server@centos7-1;hostname=centos7-1.home;pbs_leaf_name=N/A;pbs_mom_node_name=N/A
03/18/2019 08:56:33;0002;Server@centos7-1;Svr;Server@centos7-1;ipv4 interface lo: localhost4.localdomain4 
03/18/2019 08:56:33;0002;Server@centos7-1;Svr;Server@centos7-1;ipv4 interface enp9s0: centos7-1.home 
03/18/2019 08:56:33;0002;Server@centos7-1;Svr;Server@centos7-1;ipv6 interface lo: localhost6.localdomain6 
03/18/2019 08:56:33;0002;Server@centos7-1;Svr;Server@centos7-1;ipv6 interface enp9s0: centos7-1.home 
03/18/2019 08:56:33;0002;Server@centos7-1;Svr;Server@centos7-1;ipv6 interface enp9s0: centos7-1.home 
03/18/2019 08:56:33;0006;Server@centos7-1;Fil;Server@centos7-1;Version 19.0.0, started, initialization type = 1
03/18/2019 08:56:33;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 08:56:33;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 08:56:45;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 08:56:45;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 08:56:47;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 08:56:48;0006;Server@centos7-1;Svr;Server@centos7-1;Failed to start PBS dataservice
03/18/2019 08:56:48;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 08:56:52;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 08:56:52;0006;Server@centos7-1;Svr;Server@centos7-1;Failed to start PBS dataservice
03/18/2019 08:56:53;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 08:56:58;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 08:57:09;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 08:57:09;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 08:57:16;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 08:57:17;0006;Server@centos7-1;Svr;Server@centos7-1;Failed to start PBS dataservice
03/18/2019 08:57:17;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 08:57:25;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 08:57:37;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 08:57:37;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 08:57:47;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 08:57:58;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 08:57:59;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 08:58:09;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 08:58:20;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 08:58:21;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 08:58:31;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 08:58:42;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 08:58:42;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 08:58:52;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 08:59:04;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 08:59:04;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 08:59:14;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 08:59:26;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 08:59:26;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 08:59:36;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 08:59:48;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 08:59:48;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 08:59:58;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 09:00:09;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 09:00:10;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 09:00:20;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 09:00:31;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 09:00:31;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 09:00:41;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 09:00:53;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 09:00:53;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 09:01:03;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 09:01:15;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 09:01:15;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 09:01:25;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 09:01:37;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 09:01:37;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 09:01:47;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 09:01:58;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 09:01:59;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 09:02:09;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 09:02:20;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 09:02:21;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 09:02:31;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 09:02:42;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 09:02:42;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 09:02:52;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 09:03:04;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 09:03:04;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 09:03:14;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 09:03:26;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 09:03:26;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 09:03:36;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 09:03:48;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 09:03:48;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 09:03:58;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 09:04:09;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 09:04:10;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 09:04:20;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 09:04:31;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 09:04:32;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 09:04:42;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 09:04:53;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 09:04:53;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1
03/18/2019 09:05:03;0002;Server@centos7-1;Svr;Server@centos7-1;Starting PBS dataservice
03/18/2019 09:05:15;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection:  failed: could not connect to server: Connection refused
	Is the server running on host "192.168.1.49" and accepting
	TCP/IP connections on port 15007?]
03/18/2019 09:05:15;0002;Server@centos7-1;Svr;Server@centos7-1;pbs_status_db exit code 1

sched_logs

03/18/2019 08:56:08;0002;pbs_sched;Svr;Log;Log opened
03/18/2019 08:56:08;0002;pbs_sched;Svr;pbs_sched;pbs_version=19.0.0
03/18/2019 08:56:08;0002;pbs_sched;Svr;pbs_sched;pbs_build=mach=N/A:security=N/A:configure_args=N/A
03/18/2019 08:56:08;0002;pbs_sched;Svr;pbs_sched;hostname=centos7-1.home;pbs_leaf_name=N/A;pbs_mom_node_name=N/A
03/18/2019 08:56:08;0002;pbs_sched;Svr;pbs_sched;ipv4 interface lo: localhost4.localdomain4 
03/18/2019 08:56:08;0002;pbs_sched;Svr;pbs_sched;ipv4 interface enp9s0: centos7-1.home 
03/18/2019 08:56:08;0002;pbs_sched;Svr;pbs_sched;ipv6 interface lo: localhost6.localdomain6 
03/18/2019 08:56:08;0002;pbs_sched;Svr;pbs_sched;ipv6 interface enp9s0: centos7-1.home 
03/18/2019 08:56:08;0002;pbs_sched;Svr;pbs_sched;ipv6 interface enp9s0: centos7-1.home 
03/18/2019 08:56:08;0002;pbs_sched;n/a;setup_env;read environment from /var/spool/pbs/pbs_environment
03/18/2019 08:56:08;0040;pbs_sched;Fil;sched_config;Obsolete config name sort_queues
03/18/2019 08:56:08;0004;pbs_sched;Fil;holidays;The holiday file is out of date; please update it.
03/18/2019 08:56:08;0040;pbs_sched;Fil;fairshare usage;Creating usage database for fairshare
03/18/2019 08:56:08;0006;pbs_sched;Fil;pbs_sched;Version 19.0.0, started, initialization type = 0
03/18/2019 08:56:08;0002;pbs_sched;Svr;main;/opt/pbs/sbin/pbs_sched startup pid 9634
03/18/2019 08:56:08;0d80;pbs_sched;TPP;pbs_sched(Main Thread);TPP set to use reserved port authentication
03/18/2019 08:56:08;0c06;pbs_sched;TPP;pbs_sched(Main Thread);TPP leaf node names = 192.168.1.49:15004,127.0.0.1:15004,192.168.1.49:15004
03/18/2019 08:56:08;0d80;pbs_sched;TPP;pbs_sched(Main Thread);Initializing TPP transport Layer
03/18/2019 08:56:08;0d80;pbs_sched;TPP;pbs_sched(Main Thread);Max files allowed = 1024
03/18/2019 08:56:08;0c06;pbs_sched;TPP;pbs_sched(Main Thread);Max files too low - you may want to increase it.
03/18/2019 08:56:08;0d80;pbs_sched;TPP;pbs_sched(Main Thread);TPP initialization done
03/18/2019 08:56:08;0c06;pbs_sched;TPP;pbs_sched(Main Thread);Single pbs_comm configured, TPP Fault tolerant mode disabled
03/18/2019 08:56:08;0d80;pbs_sched;TPP;pbs_sched(Main Thread);Connecting to pbs_comm centos7-1:17001
03/18/2019 08:56:08;0c06;pbs_sched;TPP;pbs_sched(Thread 0);Thread ready
03/18/2019 08:56:08;0c06;pbs_sched;TPP;pbs_sched(Thread 0);Registering address 192.168.1.49:15004 to pbs_comm
03/18/2019 08:56:08;0c06;pbs_sched;TPP;pbs_sched(Thread 0);Connected to pbs_comm centos7-1:17001
03/18/2019 08:56:09;0080;pbs_sched;Req;;Starting Scheduling Cycle
03/18/2019 08:56:09;0004;pbs_sched;Fil;holidays;The holiday file is out of date; please update it.
03/18/2019 08:56:09;0080;pbs_sched;Job;77.centos7-1;Considering job to run
03/18/2019 08:56:09;0040;pbs_sched;Job;77.centos7-1;Not enough free nodes available
03/18/2019 08:56:09;0080;pbs_sched;Req;;Leaving Scheduling Cycle
03/18/2019 08:56:28;0002;pbs_sched;Svr;die;caught signal 15
03/18/2019 08:56:28;0002;pbs_sched;Svr;Log;Log closed
03/18/2019 08:56:33;0002;pbs_sched;Svr;Log;Log opened
03/18/2019 08:56:33;0002;pbs_sched;Svr;pbs_sched;pbs_version=19.0.0
03/18/2019 08:56:33;0002;pbs_sched;Svr;pbs_sched;pbs_build=mach=N/A:security=N/A:configure_args=N/A
03/18/2019 08:56:33;0002;pbs_sched;Svr;pbs_sched;hostname=centos7-1.home;pbs_leaf_name=N/A;pbs_mom_node_name=N/A
03/18/2019 08:56:33;0002;pbs_sched;Svr;pbs_sched;ipv4 interface lo: localhost4.localdomain4 
03/18/2019 08:56:33;0002;pbs_sched;Svr;pbs_sched;ipv4 interface enp9s0: centos7-1.home 
03/18/2019 08:56:33;0002;pbs_sched;Svr;pbs_sched;ipv6 interface lo: localhost6.localdomain6 
03/18/2019 08:56:33;0002;pbs_sched;Svr;pbs_sched;ipv6 interface enp9s0: centos7-1.home 
03/18/2019 08:56:33;0002;pbs_sched;Svr;pbs_sched;ipv6 interface enp9s0: centos7-1.home 
03/18/2019 08:56:33;0002;pbs_sched;n/a;setup_env;read environment from /var/spool/pbs/pbs_environment
03/18/2019 08:56:33;0040;pbs_sched;Fil;sched_config;Obsolete config name sort_queues
03/18/2019 08:56:33;0004;pbs_sched;Fil;holidays;The holiday file is out of date; please update it.
03/18/2019 08:56:33;0040;pbs_sched;Fil;fairshare usage;Creating usage database for fairshare
03/18/2019 08:56:33;0006;pbs_sched;Fil;pbs_sched;Version 19.0.0, started, initialization type = 0
03/18/2019 08:56:33;0002;pbs_sched;Svr;main;/opt/pbs/sbin/pbs_sched startup pid 10675
03/18/2019 08:56:33;0d80;pbs_sched;TPP;pbs_sched(Main Thread);TPP set to use reserved port authentication
03/18/2019 08:56:33;0c06;pbs_sched;TPP;pbs_sched(Main Thread);TPP leaf node names = 192.168.1.49:15004,127.0.0.1:15004,192.168.1.49:15004
03/18/2019 08:56:33;0d80;pbs_sched;TPP;pbs_sched(Main Thread);Initializing TPP transport Layer
03/18/2019 08:56:33;0d80;pbs_sched;TPP;pbs_sched(Main Thread);Max files allowed = 1024
03/18/2019 08:56:33;0c06;pbs_sched;TPP;pbs_sched(Main Thread);Max files too low - you may want to increase it.
03/18/2019 08:56:33;0d80;pbs_sched;TPP;pbs_sched(Main Thread);TPP initialization done
03/18/2019 08:56:33;0c06;pbs_sched;TPP;pbs_sched(Main Thread);Single pbs_comm configured, TPP Fault tolerant mode disabled
03/18/2019 08:56:33;0d80;pbs_sched;TPP;pbs_sched(Main Thread);Connecting to pbs_comm centos7-1:17001
03/18/2019 08:56:33;0c06;pbs_sched;TPP;pbs_sched(Thread 0);Thread ready
03/18/2019 08:56:33;0c06;pbs_sched;TPP;pbs_sched(Thread 0);Registering address 192.168.1.49:15004 to pbs_comm
03/18/2019 08:56:33;0c06;pbs_sched;TPP;pbs_sched(Thread 0);Connected to pbs_comm centos7-1:17001
03/18/2019 09:05:16;0002;pbs_sched;Svr;die;caught signal 15
03/18/2019 09:05:16;0002;pbs_sched;Svr;Log;Log closed

Thank you for sharing the logs.

  1. make sure ports 15001 to 15007 , 17001 are not blocked and SELinux disabled & system is rebooted after disabling SELinux.
  2. make sure the /etc/hosts is populated correctly ( DNS is properly configured for forward and reverse address resolution) on the headnode and across all the compute nodes

03/18/2019 08:56:45;0002;Server@centos7-1;Svr;Server@centos7-1;PBS dataservice not running:[Connection: failed: could not connect to server: Connection refused
Is the server running on host ā€œ192.168.1.49ā€ and accepting
TCP/IP connections on port 15007?]
FYI: https://blog.bigbinary.com/2016/01/23/configure-postgresql-to-allow-remote-connection.html

  1. if you check the scheduler logs, you see there are messages regarding system ulimits. Increase the ulimits at the system level or you can populate the system limits in /opt/pbs/lib/init.d/limits.pbs_mom & /opt/pbs/lib/init.d/limits.pbs_server .

  2. Check the server logs ( scheduler is down, server is contacting the scheduler here)
    URL: https://pbspro.atlassian.net/browse/PP-1083

1 Like

Thanks a lot for your help

Hello,

I see that this is an old post, wondering if anybody can help. I installed openPBS and very strangely when I run " sudo /etc/init.d/pbs start " its says:

Starting PBS
PBS comm already running.
PBS scheduler already running.
PBS Server already running.

but when I do ā€œ/etc/init.d/pbs statusā€ it shows:

pbs_server is not running
pbs_sched is not running
pbs_comm is not running

I am confused. I run few jobs, they remain in queued and error file shows:

/var/spool/pbs/mom_priv/jobs/2001.PiMaster.SC: line 28: /home/pi/demo/fpi-serial: Permission denied

and output file is:

SSH is enabled and the default password for the ā€˜piā€™ user has not been changed.
This is a security risk - please login as the ā€˜piā€™ user and type ā€˜passwdā€™ to set a new password.

Thanks for helping
Best

I also encountered the problem of ā€œMax files allowed = 1024, Max files too low - you may want to increase itā€
All of the following operations are performed by root

My steps are as follows:
(1)The default ulimit -n of my OS is 1024.
So when the PBS service is started, PBS_mom, pbs_sched,pbs_serveres all prompt as follows (no pbs_comm):

			01/07/2022 18:01:16;0d80;pbs_mom;TPP;pbs_mom(Main Thread);Max files allowed = 1024
			01/07/2022 18:01:16;0c06;pbs_mom;TPP;pbs_mom(Main Thread);Max files too low - you may want to increase it.
			01/07/2022 18:01:36;0d80;pbs_sched;TPP;pbs_sched(Main Thread);Max files allowed = 1024
			01/07/2022 18:01:36;0c06;pbs_sched;TPP;pbs_sched(Main Thread);Max files too low - you may want to increase it.
			01/07/2022 18:02:02;0d80;Server@cf-poc-master;TPP;Server@cf-poc-master(Main Thread);Max files allowed = 1024
			01/07/2022 18:02:02;0c06;Server@cf-poc-master;TPP;Server@cf-poc-master(Main Thread);Max files too low - you may want to increase it.

(2)Then I change ulimit -n 102400 in /etc/security/limits.conf:
root soft nofile 102400
root hard nofile 102400

(3)Then restart the machine and confirm that the modification takes effect
command ā€œulimit -nā€ output 102400

But, when PBS service starts again, pbs_mom, pbs_sched and pbs_server(except pbs_comm) still report warning mean that 1024 too low like above,
then I try command ā€œulimit -n 102400ā€ in the current login shell, it still no effect

Why?

Should I modify ulimit in the pbsā€™s systemd service conf file,or by systemctl, or there maybe some conf in pbs overwrite the OS config?

Looking forward to your reply, thank you!

PSļ¼š
[root@cf-poc-master ~]# cat /opt/pbs/lib/init.d/limits.pbs_mom
if [ -f /etc/sgi-release -o -f /etc/sgi-compute-node-release ] ; then
MEMLOCKLIM=ulimit -l
NOFILESLIM=ulimit -n
STACKLIM=ulimit -s
ulimit -l unlimited
ulimit -n 16384
ulimit -s unlimited
fi

The contents of this fils limits.pbs_mom should be on the below line (remove the conditional if)

MEMLOCKLIM= ulimit -l
NOFILESLIM= ulimit -n
STACKLIM= ulimit -s
ulimit -l unlimited
ulimit -n 16384
ulimit -s unlimited

and restart the pbs services on the compute node.
Similarly, you can have the limits.pbs_sched and limits.pbs_server in the same location with the same contents and restart respective services.

Make sure /etc/security/limits.conf has the following entries:
( restart the system or sysctl -p should work )

  • soft memlock unlimited
  • hard memlock unlimited

Hope this helps