Hello,
I have installed OpenPBS v22.05.11 on a fresh install of Ubuntu 20.04
I am currently testing PBS on a computer which is acting as both the node/server.
I can’t qsub
jobs or run commands like qstat
or pbsnodes
as it returns this error:
Connection refused
qstat: cannot connect to server coulomb (errno=15010)
The issues seems to be due to the pbs_server, specifically pbs_status_db exit code 1
in /var/spool/pbs/server_logs/20221120:
11/20/2022 14:53:53;0002;Server@coulomb;Svr;Log;Log opened
11/20/2022 14:53:53;0002;Server@coulomb;Svr;Server@coulomb;pbs_version=20.0.0
11/20/2022 14:53:53;0002;Server@coulomb;Svr;Server@coulomb;pbs_build=mach=N/A:security=N/A:configure_args=N/A
11/20/2022 14:53:53;0002;Server@coulomb;Svr;Server@coulomb;hostname=coulomb;pbs_leaf_name=N/A;pbs_mom_node_name=N/A
11/20/2022 14:53:53;0002;Server@coulomb;Svr;Server@coulomb;ipv4 interface lo: localhost
11/20/2022 14:53:53;0002;Server@coulomb;Svr;Server@coulomb;ipv4 interface enp6s0: coulomb
11/20/2022 14:53:53;0002;Server@coulomb;Svr;Server@coulomb;ipv6 interface lo: ip6-loopback
11/20/2022 14:53:53;0002;Server@coulomb;Svr;Server@coulomb;ipv6 interface enp6s0: coulomb
11/20/2022 14:53:53;0006;Server@coulomb;Fil;Server@coulomb;Version 20.0.0, started, initialization type = 1
11/20/2022 14:53:53;0002;Server@coulomb;Svr;Server@coulomb;pbs_status_db exit code 1
11/20/2022 14:53:53;0002;Server@coulomb;Svr;Server@coulomb;Starting PBS dataservice
11/20/2022 14:53:56;0002;Server@coulomb;Svr;Server@coulomb;connected to PBS dataservice@coulomb
11/20/2022 14:53:56;0d80;Server@coulomb;TPP;Server@coulomb(Main Thread);TPP authentication method = resvport
11/20/2022 14:53:56;0c06;Server@coulomb;TPP;Server@coulomb(Main Thread);TPP leaf node names = 10.65.XX.XXX:15001,127.0.0.1:15001,10.65.XX.XXX:15001 # numbers placed with X
11/20/2022 14:53:56;0d80;Server@coulomb;TPP;Server@coulomb(Main Thread);Initializing TPP transport Layer
11/20/2022 14:53:56;0d80;Server@coulomb;TPP;Server@coulomb(Main Thread);Max files allowed = 16384
11/20/2022 14:53:56;0d80;Server@coulomb;TPP;Server@coulomb(Main Thread);TPP initialization done
11/20/2022 14:53:56;0d80;Server@coulomb;TPP;Server@coulomb(Main Thread);Connecting to pbs_comm coulomb:17001
11/20/2022 14:53:56;0c06;Server@coulomb;TPP;Server@coulomb(Thread 0);Thread ready
11/20/2022 14:53:56;0c06;Server@coulomb;TPP;Server@coulomb(Thread 0);Registering address 10.65.XX.XXX:15001 to pbs_comm coulomb:17001 # numbers placed with X
11/20/2022 14:53:56;0c06;Server@coulomb;TPP;Server@coulomb(Thread 0);Connected to pbs_comm coulomb:17001
11/20/2022 14:53:56;0002;Server@coulomb;n/a;setup_env;read environment from /var/spool/pbs/pbs_environment
11/20/2022 14:53:56;0000;Server@coulomb;Svr;Server@coulomb;Supported authentication method: resvport
11/20/2022 14:53:56;0002;Server@coulomb;Svr;Server@coulomb;Stopping PBS dataservice
Regarding ports
I have firewalld stopped and disabled
I don’t have SELinux installed
I have also diabled ufw (ubuntu firewall)
Some relevant outputs:
/etc/pbs.conf
PBS_SERVER=coulomb
PBS_START_SERVER=1
PBS_START_SCHED=1
PBS_START_COMM=1
PBS_START_MOM=1
PBS_EXEC=/opt/pbs
PBS_HOME=/var/spool/pbs
PBS_CORE_LIMIT=unlimited
PBS_SCP=/usr/bin/scp
/etc/hosts
127.0.0.1 localhost
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
10.65.XX.XXX coulomb # numbers have bee replaced with X
hostname / hostname -A
coulomb
hostname -i
10.65.XX.XXX # numbers have bee replaced with X
sudo /etc/init.d/pbs start
Starting PBS
/opt/pbs/sbin/pbs_comm ready (pid=12984), Proxy Name:coulomb:17001, Threads:4
PBS comm
PBS mom
PBS sched
Connecting to PBS dataservice...connected to PBS dataservice@coulomb
PBS server
sudo /etc/init.d/pbs status
pbs_server is not running
pbs_mom is pid 12994
pbs_sched is pid 13005
pbs_comm is 12984
ps -ef | grep pbs_
root 12984 1 0 16:24 ? 00:00:00 /opt/pbs/sbin/pbs_comm
root 12994 1 0 16:24 ? 00:00:00 /opt/pbs/sbin/pbs_mom
root 13005 1 0 16:24 ? 00:00:00 /opt/pbs/sbin/pbs_sched
ali 13233 5207 0 16:27 pts/3 00:00:00 grep --color=auto pbs_
pbs_hostn -v $PBS_SERVER
aliases: -none-
address length: 4 bytes
address: 10.65.XX.XXX (2200060170 dec) name: coulomb # numbers replaced by X
Other OpenBPS logs:
Some numbers in the ip addressed have been replaced with X’s
/var/spool/pbs/mom_logs/20221120
11/20/2022 16:02:20;0002;pbs_mom;Svr;Log;Log opened
11/20/2022 16:02:20;0002;pbs_mom;Svr;pbs_mom;pbs_version=20.0.0
11/20/2022 16:02:20;0002;pbs_mom;Svr;pbs_mom;pbs_build=mach=N/A:security=N/A:configure_args=N/A
11/20/2022 16:02:20;0002;pbs_mom;Svr;pbs_mom;hostname=coulomb;pbs_leaf_name=N/A;pbs_mom_node_name=N/A
11/20/2022 16:02:20;0002;pbs_mom;Svr;pbs_mom;ipv4 interface lo: localhost
11/20/2022 16:02:20;0002;pbs_mom;Svr;pbs_mom;ipv4 interface enp6s0: coulomb
11/20/2022 16:02:20;0002;pbs_mom;Svr;pbs_mom;ipv6 interface lo: ip6-loopback
11/20/2022 16:02:20;0002;pbs_mom;Svr;pbs_mom;ipv6 interface enp6s0: coulomb
11/20/2022 16:02:20;0100;pbs_mom;Svr;parse_config;file config
11/20/2022 16:02:20;0002;pbs_mom;Svr;pbs_mom;Adding IP address 10.65.XX.XXX as authorized
11/20/2022 16:02:20;0002;pbs_mom;n/a;set_restrict_user_maxsys;setting 999
11/20/2022 16:02:20;0002;pbs_mom;n/a;read_config;max_check_poll = 120, min_check_poll = 10
11/20/2022 16:02:20;0002;pbs_mom;Svr;pbs_mom;Adding IP address 127.0.0.1 as authorized
11/20/2022 16:02:20;0002;pbs_mom;Svr;set_checkpoint_path;Using default checkpoint path.
11/20/2022 16:02:20;0002;pbs_mom;Svr;set_checkpoint_path;Setting checkpoint path to /var/spool/pbs/checkpoint/
11/20/2022 16:02:20;0002;pbs_mom;n/a;ncpus;hyperthreading enabled
11/20/2022 16:02:20;0002;pbs_mom;n/a;initialize;pcpus=12, OS reports 12 cpu(s)
11/20/2022 16:02:20;0d80;pbs_mom;TPP;pbs_mom(Main Thread);TPP authentication method = resvport
11/20/2022 16:02:20;0c06;pbs_mom;TPP;pbs_mom(Main Thread);TPP leaf node names = 10.65.XX.XXX:15003,127.0.0.1:15003,10.65.XX.XXX:15003
11/20/2022 16:02:20;0d80;pbs_mom;TPP;pbs_mom(Main Thread);Initializing TPP transport Layer
11/20/2022 16:02:20;0d80;pbs_mom;TPP;pbs_mom(Main Thread);Max files allowed = 16384
11/20/2022 16:02:20;0d80;pbs_mom;TPP;pbs_mom(Main Thread);TPP initialization done
11/20/2022 16:02:20;0d80;pbs_mom;TPP;pbs_mom(Main Thread);Connecting to pbs_comm coulomb:17001
11/20/2022 16:02:20;0c06;pbs_mom;TPP;pbs_mom(Thread 0);Thread ready
11/20/2022 16:02:20;0006;pbs_mom;Fil;pbs_mom;Version 20.0.0, started, initialization type = 0
11/20/2022 16:02:20;0002;pbs_mom;Svr;pbs_mom;Mom pid = 11693 ready, using ports Server:15001 MOM:15002 RM:15003
11/20/2022 16:02:20;0c06;pbs_mom;TPP;pbs_mom(Thread 0);Registering address 10.65.XX.XXX:15003 to pbs_comm coulomb:17001
11/20/2022 16:02:20;0c06;pbs_mom;TPP;pbs_mom(Thread 0);Connected to pbs_comm coulomb:17001
11/20/2022 16:02:20;0001;pbs_mom;Svr;net_restore_handler;net restore handler called
11/20/2022 16:02:22;0002;pbs_mom;Svr;pbs_mom;HELLO sent to server at coulomb:15001, stream:0
11/20/2022 16:02:22;0001;pbs_mom;Svr;pbs_mom;im_eof, Premature end of message from addr 10.65.XX.XXX:15001 on stream 0
11/20/2022 16:02:22;0002;pbs_mom;Svr;im_eof;Server closed connection.
11/20/2022 16:02:24;0002;pbs_mom;Svr;pbs_mom;HELLO sent to server at coulomb:15001, stream:1
11/20/2022 16:02:25;0001;pbs_mom;Svr;pbs_mom;im_eof, Premature end of message from addr 10.65.XX.XXX:15001 on stream 1
11/20/2022 16:02:25;0002;pbs_mom;Svr;im_eof;Server closed connection.
/var/spool/pbs/comm_logs/20221120
11/20/2022 16:02:20;0002;Comm@coulomb;Svr;Log;Log opened
11/20/2022 16:02:20;0002;Comm@coulomb;Svr;Comm@coulomb;pbs_version=20.0.0
11/20/2022 16:02:20;0002;Comm@coulomb;Svr;Comm@coulomb;pbs_build=mach=N/A:security=N/A:configure_args=N/A
11/20/2022 16:02:20;0002;Comm@coulomb;Svr;Comm@coulomb;hostname=coulomb;pbs_leaf_name=N/A;pbs_mom_node_name=N/A
11/20/2022 16:02:20;0002;Comm@coulomb;Svr;Comm@coulomb;ipv4 interface lo: localhost
11/20/2022 16:02:20;0002;Comm@coulomb;Svr;Comm@coulomb;ipv4 interface enp6s0: coulomb
11/20/2022 16:02:20;0002;Comm@coulomb;Svr;Comm@coulomb;ipv6 interface lo: ip6-loopback
11/20/2022 16:02:20;0002;Comm@coulomb;Svr;Comm@coulomb;ipv6 interface enp6s0: coulomb
11/20/2022 16:02:20;0002;Comm@coulomb;Svr;Comm@coulomb;/opt/pbs/sbin/pbs_comm ready (pid=11683), Proxy Name:coulomb:17001, Threads:4
11/20/2022 16:02:20;0000;Comm@coulomb;Svr;Comm@coulomb;Supported authentication method: resvport
11/20/2022 16:02:20;0c06;Comm@coulomb;TPP;Comm@coulomb(Thread 0);Thread ready
11/20/2022 16:02:20;0c06;Comm@coulomb;TPP;Comm@coulomb(Thread 1);Thread ready
11/20/2022 16:02:20;0c06;Comm@coulomb;TPP;Comm@coulomb(Thread 3);Thread ready
11/20/2022 16:02:20;0c06;Comm@coulomb;TPP;Comm@coulomb(Thread 2);Thread ready
11/20/2022 16:02:20;0c06;Comm@coulomb;TPP;Comm@coulomb(Thread 1);tfd=14, Leaf registered address 10.65.XX.XXX:15003
11/20/2022 16:02:23;0c06;Comm@coulomb;TPP;Comm@coulomb(Thread 2);tfd=16, Leaf registered address 10.65.XX.XXX:15001
11/20/2022 16:02:25;0c06;Comm@coulomb;TPP;Comm@coulomb(Thread 2);tfd=16, Connection from leaf 10.65.XX.XXX:15001 down
/var/spool/pbs/sched_logs/20221120
11/20/2022 16:02:20;0002;pbs_sched;Svr;Log;Log opened
11/20/2022 16:02:20;0002;pbs_sched;Svr;pbs_sched;pbs_version=20.0.0
11/20/2022 16:02:20;0002;pbs_sched;Svr;pbs_sched;pbs_build=mach=N/A:security=N/A:configure_args=N/A
11/20/2022 16:02:20;0002;pbs_sched;Svr;pbs_sched;hostname=coulomb;pbs_leaf_name=N/A;pbs_mom_node_name=N/A
11/20/2022 16:02:20;0002;pbs_sched;Svr;pbs_sched;ipv4 interface lo: localhost
11/20/2022 16:02:20;0002;pbs_sched;Svr;pbs_sched;ipv4 interface enp6s0: coulomb
11/20/2022 16:02:20;0002;pbs_sched;Svr;pbs_sched;ipv6 interface lo: ip6-loopback
11/20/2022 16:02:20;0002;pbs_sched;Svr;pbs_sched;ipv6 interface enp6s0: coulomb
11/20/2022 16:02:20;0002;pbs_sched;n/a;setup_env;read environment from /var/spool/pbs/pbs_environment
11/20/2022 16:02:20;0006;pbs_sched;Fil;pbs_sched;Version 20.0.0, started, initialization type = 0
11/20/2022 16:02:20;0002;pbs_sched;Svr;sched_main;/opt/pbs/sbin/pbs_sched startup pid 11704
11/20/2022 16:02:20;0040;pbs_sched;Fil;fairshare usage;Creating usage database for fairshare
11/20/2022 16:02:20;0080;pbs_sched;Req;;Launching 6 worker threads
You help would be greatly appreciated!