Hi @ all,
I’m new on using OpenPBS. I finally want to install and run it on a small Cluster, but at present I’m trying to get it run on a single node and it fails everytime. I installed OpenPBS 20 (openpbs-master) following the guidelines in https://github.com/openpbs/openpbs/blob/master/INSTALL .
At present I got:
dy@master:~$ sudo /etc/init.d/pbs status
[sudo] Passwort für dy:
pbs_server is pid 10123
pbs_mom is pid 2003
pbs_sched is pid 2019
pbs_comm is 1955
dy@master:~$ sudo /etc/init.d/pbs restart
Restarting PBS
Stopping PBS
PBS mom - was pid: 2003
PBS sched - was pid: 2019
PBS comm - was pid: 1955
Waiting for shutdown to complete
Starting PBS
/opt/pbs/sbin/pbs_comm ready (pid=10972), Proxy Name:master.local:17001, Threads:4
PBS comm
PBS mom
PBS sched
PBS Server already running.
dy@master:~$ ps -ef | grep pbs
postgres 4536 1477 0 07:33 ? 00:00:00 postgres: 12/main: postgres pbs 127.0.0.1(40892) idle
root 10752 1 0 08:33 ? 00:00:00 /opt/pbs/sbin/pbs_server.bin
root 10972 1676 0 08:33 ? 00:00:00 /opt/pbs/sbin/pbs_comm
root 10991 1676 0 08:33 ? 00:00:00 /opt/pbs/sbin/pbs_mom
root 11004 1676 0 08:33 ? 00:00:00 /opt/pbs/sbin/pbs_sched
The files:
cat /etc/pbs.conf
PBS_SERVER=master
PBS_START_SERVER=1
PBS_START_SCHED=1
PBS_START_COMM=1
PBS_START_MOM=1
PBS_EXEC=/opt/pbs
PBS_HOME=/var/spool/pbs
PBS_CORE_LIMIT=unlimited
PBS_SCP=/usr/bin/scp
cat /etc/hosts
127.0.0.1 localhost
#127.0.1.1 dy-HP-350-G2
100.200.10.1 master.local master
100.200.10.2 master1.local master1
The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
The Logfiles from server_logs reports:
08/28/2020 00:00:39;0002;Server@master;Svr;Server@master;Failed to connect to PBS dataservice
08/28/2020 07:20:48;0002;Server@master;Svr;Log;Log opened
08/28/2020 07:20:48;0002;Server@master;Svr;Server@master;pbs_version=20.0.0
08/28/2020 07:20:48;0002;Server@master;Svr;Server@master;pbs_build=mach=N/A:security=N/A:configure_args=N/A
08/28/2020 07:20:48;0002;Server@master;Svr;Server@master;hostname=master.local;pbs_leaf_name=N/A;pbs_mom_node_name=N/A
08/28/2020 07:20:48;0002;Server@master;Svr;Server@master;ipv4 interface lo: localhost
08/28/2020 07:20:48;0002;Server@master;Svr;Server@master;ipv4 interface wlp2s0: localhost
08/28/2020 07:20:48;0002;Server@master;Svr;Server@master;ipv6 interface lo: ip6-loopback
08/28/2020 07:20:48;0002;Server@master;Svr;Server@master;ipv6 interface wlp2s0: master
08/28/2020 07:20:48;0006;Server@master;Fil;Server@master;Version 20.0.0, started, initialization type = 1
08/28/2020 07:20:50;0002;Server@master;Svr;Server@master;pbs_status_db exit code 1
08/28/2020 07:20:50;0002;Server@master;Svr;Server@master;Starting PBS dataservice
08/28/2020 07:23:03;0002;Server@master;Svr;Server@master;Failed to connect to PBS dataservice
08/28/2020 07:23:04;0002;Server@master;Svr;Server@master;Starting PBS dataservice
08/28/2020 07:23:06;0002;Server@master;Svr;Server@master;pbs_status_db exit code 0
and mom_logs:
08/28/2020 07:20:47;0002;pbs_mom;Svr;Log;Log opened
08/28/2020 07:20:47;0002;pbs_mom;Svr;pbs_mom;pbs_version=20.0.0
08/28/2020 07:20:47;0002;pbs_mom;Svr;pbs_mom;pbs_build=mach=N/A:security=N/A:configure_args=N/A
08/28/2020 07:20:47;0002;pbs_mom;Svr;pbs_mom;hostname=master.local;pbs_leaf_name=N/A;pbs_mom_node_name=N/A
08/28/2020 07:20:47;0002;pbs_mom;Svr;pbs_mom;ipv4 interface lo: localhost
08/28/2020 07:20:47;0002;pbs_mom;Svr;pbs_mom;ipv4 interface wlp2s0: localhost
08/28/2020 07:20:47;0002;pbs_mom;Svr;pbs_mom;ipv6 interface lo: ip6-loopback
08/28/2020 07:20:47;0002;pbs_mom;Svr;pbs_mom;ipv6 interface wlp2s0: master
08/28/2020 07:20:47;0100;pbs_mom;Svr;parse_config;file config
08/28/2020 07:20:47;0002;pbs_mom;Svr;pbs_mom;Adding IP address 100.200.10.1 as authorized
08/28/2020 07:20:47;0002;pbs_mom;n/a;set_restrict_user_maxsys;setting 999
08/28/2020 07:20:47;0002;pbs_mom;n/a;read_config;max_check_poll = 120, min_check_poll = 10
08/28/2020 07:20:47;0d80;pbs_mom;TPP;pbs_mom(Main Thread);TPP authentication method = resvport
08/28/2020 07:20:47;0c06;pbs_mom;TPP;pbs_mom(Main Thread);TPP leaf node names = 100.200.10.1:15003,127.0.0.1:15003,192.168.2.106:15003
08/28/2020 07:20:47;0d80;pbs_mom;TPP;pbs_mom(Main Thread);Initializing TPP transport Layer
08/28/2020 07:20:47;0d80;pbs_mom;TPP;pbs_mom(Main Thread);Max files allowed = 16384
08/28/2020 07:20:47;0d80;pbs_mom;TPP;pbs_mom(Main Thread);TPP initialization done
08/28/2020 07:20:47;0c06;pbs_mom;TPP;pbs_mom(Main Thread);Single pbs_comm configured, TPP Fault tolerant mode disabled
08/28/2020 07:20:47;0d80;pbs_mom;TPP;pbs_mom(Main Thread);Connecting to pbs_comm master:17001
08/28/2020 07:20:47;0c06;pbs_mom;TPP;pbs_mom(Thread 0);Thread ready
08/28/2020 07:20:47;0002;pbs_mom;Svr;pbs_mom;Adding IP address 127.0.0.1 as authorized
08/28/2020 07:20:47;0002;pbs_mom;Svr;set_checkpoint_path;Using default checkpoint path.
08/28/2020 07:20:47;0002;pbs_mom;Svr;set_checkpoint_path;Setting checkpoint path to /var/spool/pbs/checkpoint/
08/28/2020 07:20:49;0002;pbs_mom;n/a;ncpus;hyperthreading enabled
08/28/2020 07:20:49;0002;pbs_mom;n/a;initialize;pcpus=4, OS reports 4 cpu(s)
08/28/2020 07:20:49;0006;pbs_mom;Fil;pbs_mom;Version 20.0.0, started, initialization type = 0
08/28/2020 07:20:49;0002;pbs_mom;Svr;pbs_mom;Mom pid = 2003 ready, using ports Server:15001 MOM:15002 RM:15003
08/28/2020 07:20:49;0001;pbs_mom;Svr;pbs_mom;Success (0) in pbs_mom, Failed to send HELLO at master:15001
I checked /var/spool/pbs/spool/ and found db error files containing the server is not running. But it is. I got the database pbs_datastore including scheme pbs and tables.
Running commands like qstat, qmgr fail, seams like hanging up. I got no response from the system.
Firewall is disabled - generally disabled by default, but I checked it. SELinux is not found on the system. I figured out that it might be not installed by the OS at the beginning. According to this I assume this disabled.
The OS I use is Linux Mint 19.2, which is equivalent to Ubuntu 18. The database is a postgresql 12.
I checked the forum hoping to find someone experiencing the same problems and allready tried installation using data service user pbsdata as well, but it didn’t work. I also tried ./configure --with-database-dir / database-user which fails. I know sometimes I got the response “cannot connect to server (errorno=111)” or (errorno=113) when using a command. But that happens when some of the services are not running. Presently it seems that they are not communicating to each other and I don’t know why.
I hope someone can help.
Many thanks
Sebastian