Problems install and running PBS on Ubuntu 18

Hi @ all,

I’m new on using OpenPBS. I finally want to install and run it on a small Cluster, but at present I’m trying to get it run on a single node and it fails everytime. I installed OpenPBS 20 (openpbs-master) following the guidelines in https://github.com/openpbs/openpbs/blob/master/INSTALL .

At present I got:

dy@master:~$ sudo /etc/init.d/pbs status
[sudo] Passwort für dy:
pbs_server is pid 10123
pbs_mom is pid 2003
pbs_sched is pid 2019
pbs_comm is 1955

dy@master:~$ sudo /etc/init.d/pbs restart
Restarting PBS
Stopping PBS
PBS mom - was pid: 2003
PBS sched - was pid: 2019
PBS comm - was pid: 1955
Waiting for shutdown to complete
Starting PBS
/opt/pbs/sbin/pbs_comm ready (pid=10972), Proxy Name:master.local:17001, Threads:4
PBS comm
PBS mom
PBS sched
PBS Server already running.

dy@master:~$ ps -ef | grep pbs
postgres 4536 1477 0 07:33 ? 00:00:00 postgres: 12/main: postgres pbs 127.0.0.1(40892) idle
root 10752 1 0 08:33 ? 00:00:00 /opt/pbs/sbin/pbs_server.bin
root 10972 1676 0 08:33 ? 00:00:00 /opt/pbs/sbin/pbs_comm
root 10991 1676 0 08:33 ? 00:00:00 /opt/pbs/sbin/pbs_mom
root 11004 1676 0 08:33 ? 00:00:00 /opt/pbs/sbin/pbs_sched

The files:

cat /etc/pbs.conf
PBS_SERVER=master
PBS_START_SERVER=1
PBS_START_SCHED=1
PBS_START_COMM=1
PBS_START_MOM=1
PBS_EXEC=/opt/pbs
PBS_HOME=/var/spool/pbs
PBS_CORE_LIMIT=unlimited
PBS_SCP=/usr/bin/scp

cat /etc/hosts
127.0.0.1 localhost
#127.0.1.1 dy-HP-350-G2
100.200.10.1 master.local master
100.200.10.2 master1.local master1

The following lines are desirable for IPv6 capable hosts

::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

The Logfiles from server_logs reports:

08/28/2020 00:00:39;0002;Server@master;Svr;Server@master;Failed to connect to PBS dataservice
08/28/2020 07:20:48;0002;Server@master;Svr;Log;Log opened
08/28/2020 07:20:48;0002;Server@master;Svr;Server@master;pbs_version=20.0.0
08/28/2020 07:20:48;0002;Server@master;Svr;Server@master;pbs_build=mach=N/A:security=N/A:configure_args=N/A
08/28/2020 07:20:48;0002;Server@master;Svr;Server@master;hostname=master.local;pbs_leaf_name=N/A;pbs_mom_node_name=N/A
08/28/2020 07:20:48;0002;Server@master;Svr;Server@master;ipv4 interface lo: localhost
08/28/2020 07:20:48;0002;Server@master;Svr;Server@master;ipv4 interface wlp2s0: localhost
08/28/2020 07:20:48;0002;Server@master;Svr;Server@master;ipv6 interface lo: ip6-loopback
08/28/2020 07:20:48;0002;Server@master;Svr;Server@master;ipv6 interface wlp2s0: master
08/28/2020 07:20:48;0006;Server@master;Fil;Server@master;Version 20.0.0, started, initialization type = 1
08/28/2020 07:20:50;0002;Server@master;Svr;Server@master;pbs_status_db exit code 1
08/28/2020 07:20:50;0002;Server@master;Svr;Server@master;Starting PBS dataservice
08/28/2020 07:23:03;0002;Server@master;Svr;Server@master;Failed to connect to PBS dataservice
08/28/2020 07:23:04;0002;Server@master;Svr;Server@master;Starting PBS dataservice
08/28/2020 07:23:06;0002;Server@master;Svr;Server@master;pbs_status_db exit code 0

and mom_logs:

08/28/2020 07:20:47;0002;pbs_mom;Svr;Log;Log opened
08/28/2020 07:20:47;0002;pbs_mom;Svr;pbs_mom;pbs_version=20.0.0
08/28/2020 07:20:47;0002;pbs_mom;Svr;pbs_mom;pbs_build=mach=N/A:security=N/A:configure_args=N/A
08/28/2020 07:20:47;0002;pbs_mom;Svr;pbs_mom;hostname=master.local;pbs_leaf_name=N/A;pbs_mom_node_name=N/A
08/28/2020 07:20:47;0002;pbs_mom;Svr;pbs_mom;ipv4 interface lo: localhost
08/28/2020 07:20:47;0002;pbs_mom;Svr;pbs_mom;ipv4 interface wlp2s0: localhost
08/28/2020 07:20:47;0002;pbs_mom;Svr;pbs_mom;ipv6 interface lo: ip6-loopback
08/28/2020 07:20:47;0002;pbs_mom;Svr;pbs_mom;ipv6 interface wlp2s0: master
08/28/2020 07:20:47;0100;pbs_mom;Svr;parse_config;file config
08/28/2020 07:20:47;0002;pbs_mom;Svr;pbs_mom;Adding IP address 100.200.10.1 as authorized
08/28/2020 07:20:47;0002;pbs_mom;n/a;set_restrict_user_maxsys;setting 999
08/28/2020 07:20:47;0002;pbs_mom;n/a;read_config;max_check_poll = 120, min_check_poll = 10
08/28/2020 07:20:47;0d80;pbs_mom;TPP;pbs_mom(Main Thread);TPP authentication method = resvport
08/28/2020 07:20:47;0c06;pbs_mom;TPP;pbs_mom(Main Thread);TPP leaf node names = 100.200.10.1:15003,127.0.0.1:15003,192.168.2.106:15003
08/28/2020 07:20:47;0d80;pbs_mom;TPP;pbs_mom(Main Thread);Initializing TPP transport Layer
08/28/2020 07:20:47;0d80;pbs_mom;TPP;pbs_mom(Main Thread);Max files allowed = 16384
08/28/2020 07:20:47;0d80;pbs_mom;TPP;pbs_mom(Main Thread);TPP initialization done
08/28/2020 07:20:47;0c06;pbs_mom;TPP;pbs_mom(Main Thread);Single pbs_comm configured, TPP Fault tolerant mode disabled
08/28/2020 07:20:47;0d80;pbs_mom;TPP;pbs_mom(Main Thread);Connecting to pbs_comm master:17001
08/28/2020 07:20:47;0c06;pbs_mom;TPP;pbs_mom(Thread 0);Thread ready
08/28/2020 07:20:47;0002;pbs_mom;Svr;pbs_mom;Adding IP address 127.0.0.1 as authorized
08/28/2020 07:20:47;0002;pbs_mom;Svr;set_checkpoint_path;Using default checkpoint path.
08/28/2020 07:20:47;0002;pbs_mom;Svr;set_checkpoint_path;Setting checkpoint path to /var/spool/pbs/checkpoint/
08/28/2020 07:20:49;0002;pbs_mom;n/a;ncpus;hyperthreading enabled
08/28/2020 07:20:49;0002;pbs_mom;n/a;initialize;pcpus=4, OS reports 4 cpu(s)
08/28/2020 07:20:49;0006;pbs_mom;Fil;pbs_mom;Version 20.0.0, started, initialization type = 0
08/28/2020 07:20:49;0002;pbs_mom;Svr;pbs_mom;Mom pid = 2003 ready, using ports Server:15001 MOM:15002 RM:15003
08/28/2020 07:20:49;0001;pbs_mom;Svr;pbs_mom;Success (0) in pbs_mom, Failed to send HELLO at master:15001

I checked /var/spool/pbs/spool/ and found db error files containing the server is not running. But it is. I got the database pbs_datastore including scheme pbs and tables.

Running commands like qstat, qmgr fail, seams like hanging up. I got no response from the system.

Firewall is disabled - generally disabled by default, but I checked it. SELinux is not found on the system. I figured out that it might be not installed by the OS at the beginning. According to this I assume this disabled.

The OS I use is Linux Mint 19.2, which is equivalent to Ubuntu 18. The database is a postgresql 12.

I checked the forum hoping to find someone experiencing the same problems and allready tried installation using data service user pbsdata as well, but it didn’t work. I also tried ./configure --with-database-dir / database-user which fails. I know sometimes I got the response “cannot connect to server (errorno=111)” or (errorno=113) when using a command. But that happens when some of the services are not running. Presently it seems that they are not communicating to each other and I don’t know why.

I hope someone can help.
Many thanks

Sebastian

can you enable the postgres logs and check if anything in particular is logged. You can enable postgres client side logging to get better details. How did you install postgres? Check that the postgresql-contrib package is actually installed and the hstore module was installed as well (pbs uses this). you might also want to disable firewalls, flush iptables, and check that the postgres user is available and can be su’ed into.

Another thing u could do is running the postinstall by hand, in debug mode, as follows:
rm -rf /opt/pbs /var/spool/pbs
make install
sh -x /opt/pbs/libexec/pbs_postinstall

If you find pbs_habitat failing inside this, you can add a sh -x to that script as well to see what is going on.