The problem with running on a clean virtual machine

Hello,
There is a problem with running PBS
Installed on a clean virtual machine CentOS Linux release 7.5.1804 pbspro_19.1.1.centos7.zip

service pbs start

Starting PBS
/opt/pbs/sbin/pbs_comm ready (pid=4593), Proxy Name:localhost:17001, Threads:4
PBS comm
PBS mom
Creating usage database for fairshare.
PBS sched
Connecting to PBS dataservice…Failed to start PBS dataservice
.Failed to start PBS dataservice
…Failed to start PBS dataservice
continuing in background.
PBS server

service pbs status

pbs_server is pid 6045
pbs_mom is not running
pbs_sched is not running
pbs_comm is not running

/opt/pbs/sbin/pbs_dataservice start

Starting PBS Data Service…
grep: /var/spool/pbs/spool/pg_start.log: No such file or directory

comm_logs
05/22/2019 05:34:41;0002;Comm@localhost;Svr;Comm@localhost;/opt/pbs/sbin/pbs_comm ready (pid=4593), Proxy Name:localhost:17001, Threads:4
05/22/2019 05:34:41;0c06;Comm@localhost;TPP;alloc_router(Main Thread);Failed to resolve address, pbs_comm=localhost:17001
05/22/2019 05:34:41;0001;Comm@localhost;Svr;Comm@localhost;main, tpp init failed

mom_logs
05/22/2019 05:34:41;0002;pbs_mom;n/a;set_restrict_user_maxsys;setting 999
05/22/2019 05:34:41;0002;pbs_mom;n/a;read_config;max_check_poll = 120, min_check_poll = 10
05/22/2019 05:34:41;0001;pbs_mom;Svr;pbs_mom;pbsd_main, Could not find any usable IP address for host localhost

sched_logs
05/22/2019 05:34:41;0006;pbs_sched;Fil;pbs_sched;Version 19.1.1, started, initialization type = 0
05/22/2019 05:34:41;0002;pbs_sched;Svr;main;/opt/pbs/sbin/pbs_sched startup pid 4612
05/22/2019 05:34:41;0001;pbs_sched;Svr;pbs_sched;pbsd_main, Could not find any usable IP address for host localhost.localdomain

server_logs
05/22/2019 05:50:39;0002;Server@localhost;Svr;Server@localhost;pbs_status_db exit code 1
05/22/2019 05:50:49;0002;Server@localhost;Svr;Server@localhost;Starting PBS dataservice
05/22/2019 05:51:00;0002;Server@localhost;Svr;Server@localhost;PBS dataservice not running:[Connection: failed: could not connect to server: Connection refused
Is the server running on host “127.0.0.1” and accepting
TCP/IP connections on port 15007?]

Thank

What I’ve done

yum install -y expat libedit postgresql-server postgresql-contrib python sendmail sudo tcl tk libical

yum install -y gcc make rpm-build libtool hwloc-devel \

  libX11-devel libXt-devel libedit-devel libical-devel \
  ncurses-devel perl postgresql-devel postgresql-contrib python-devel tcl-devel \
  tk-devel swig expat-devel openssl-devel libXext libXft \
  autoconf automake

yum install -y perl-Env

yum install -y perl-Switch

PBS_HOME=/var/spool/pbs

PBS_EXEC=/opt/pbs

PBS_SERVER=hostname

PBS_DATA_SERVICE_USER=postgres

rpm -i pbspro-server-19.1.1-0.x86_64.rpm

*** PBS Installation Summary


*** Postinstall script called as follows:
*** /opt/pbs/libexec/pbs_postinstall server 19.1.1 /opt/pbs /var/spool/pbs postgres


*** No configuration file found.
*** Creating new configuration file: /etc/pbs.conf
*** Replacing /etc/pbs.conf with /etc/pbs.conf.19.1.1
*** /etc/pbs.conf has been created.


*** Registering PBS Pro as a service.
Created symlink from /etc/systemd/system/multi-user.target.wants/pbs.service to /usr/lib/systemd/system/pbs.service.


*** PBS_HOME is /var/spool/pbs
*** Creating new file /var/spool/pbs/pbs_environment
*** WARNING: TZ not set in /var/spool/pbs/pbs_environment


*** The PBS Pro server has been installed in /opt/pbs/sbin.
*** The PBS Pro scheduler has been installed in /opt/pbs/sbin.


*** The PBS Pro communication agent has been installed in /opt/pbs/sbin.


*** The PBS Pro MOM has been installed in /opt/pbs/sbin.


*** The PBS commands have been installed in /opt/pbs/bin.


*** End of /opt/pbs/libexec/pbs_postinstall

vi /etc/pbs.conf

service pbs start

Starting PBS
PBS Home directory /var/spool/pbs needs updating.
Running /opt/pbs/libexec/pbs_habitat to update it.


*** Error initializing the PBS dataservice
Error details:
Creating the PBS Data Service…
The files belonging to this database system will be owned by user “postgres”.
This user must also own the server process.

The database cluster will be initialized with locale “C”.
The default text search configuration will be set to “english”.

initdb: could not access directory “/var/spool/pbs/datastore”: Permission denied
Error creating PBS datastore

chown -R postgres:postgres /var/spool/pbs/datastore

chown: cannot access ‘/var/spool/pbs/datastore’: No such file or directory

sudo chmod 4755 /opt/pbs/sbin/pbs_iff /opt/pbs/sbin/pbs_rcp

mkdir /var/spool/pbs/datastore

chown -R postgres:postgres /var/spool/pbs/datastore

/opt/pbs/libexec/pbs_habitat


*** Setting default queue and resource limits.


Connecting to PBS dataservice…connected to PBS dataservice@localhost
pbs_iff: cannot connect to host
pbs_iff: cannot connect to host
No Permission.
qmgr: cannot connect to server
Connection refused
qmgr: cannot connect to server
Connection refused
qterm: could not connect to server (111)
*** End of /opt/pbs/libexec/pbs_habitat

Hello,

I see the issue is network initialization, from this below error message from daemons logs “Could not find any usable IP address for host localhost.localdomain”, we can understand that $hostname is not resolvable in your test VM, please configure /etc/hosts with proper IP and try again.

-Brem

thank

yes it helped
I change hostname to pbshost and add to /etc/hosts: 10.10.23.79 pbshost

but still had to do
postgresql-setup initdb
/opt/pbs/sbin/pbs_dataservice start

But two more questions:
service pbs start does not start pbs_dataservice (to do hands)
and
service pbs stop/restart cannot complete the /opt/pbs/sbin/pbs_server.bin (kill -9)

Thank

Please update the PBS_SERVER value in the /etc/pbs.conf
PBS_SERVER=pbshost

I suppose your /etc/hosts file looks as below

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
10.10.23.79 pbshost

I suppose you have set the hostname of your server as below
hostnamectl set-hostname pbshost

Check it by running
hostnamectl

  1. stop all the services of PBS Pro (if some services are stuck then kill -INT )
  2. make the above changes
  3. start the PBS Services ( /etc/init.d/pbs start )

Hope this helps

thank

Yes,

[root@pbshost server_logs]# cat /etc/pbs.conf
PBS_EXEC=/opt/pbs
PBS_SERVER=pbshost
PBS_START_SERVER=1
PBS_START_SCHED=1
PBS_START_COMM=1
PBS_START_MOM=1
PBS_HOME=/var/spool/pbs
PBS_CORE_LIMIT=unlimited
PBS_SCP=/bin/scp

[root@pbshost server_logs]# hostname
pbshost

[root@pbshost server_logs]# hostnamectl
Static hostname: pbshost
Icon name: computer-vm
Chassis: vm
Machine ID: b1e884f00cf84ee3863d6a6c39e7b8bb
Boot ID: cf793d6c71ec475ba11e58174c979c56
Virtualization: xen
Operating System: CentOS Linux 7 (Core)
CPE OS Name: cpe:/o:centos:centos:7
Kernel: Linux 3.10.0-862.el7.x86_64
Architecture: x86-64

[root@pbshost server_logs]# cat /etc/hosts
10.10.23.79 pbshost
10.10.23.77 pbsnode1
10.10.23.78 pbsnode2
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

It works if I manually start first /opt/pbs/sbin/pbs_dataservice start,
but then not working pbs_dataservice stop/restart

Conversely, if I do not manually start pbs_dataservice, then the PBS does not start correctly (although all four processes are running), but pbs_dataservice stop begins to work

problems are interconnected

server_logs without pbs_dataservice start

05/27/2019 03:55:25;0002;Server@pbshost;Svr;Server@pbshost;pbs_status_db exit code 1
05/27/2019 03:57:43;0002;Server@pbshost;Svr;Server@pbshost;Starting PBS dataservice
05/27/2019 03:57:54;0002;Server@pbshost;Svr;Server@pbshost;PBS dataservice not running:[Connection: failed: could not connect to server: Connection refused
Is the server running on host “10.10.23.79” and accepting
TCP/IP connections on port 15007?]
05/27/2019 03:57:55;0002;Server@pbshost;Svr;Server@pbshost;pbs_status_db exit code 1

server_logs with pbs_dataservice start

05/27/2019 03:58:43;0002;Server@pbshost;Svr;Server@pbshost;pbs_status_db exit code 0
05/27/2019 03:58:43;0002;Server@pbshost;Svr;Server@pbshost;Starting PBS dataservice
05/27/2019 03:58:47;0002;Server@pbshost;Svr;Server@pbshost;connected to PBS dataservice@pbshost

Please make sure

  1. firewall is turned off
  2. SELinux is disabled and system is rebooted
  3. Ports 15001 to 15007 and 17001 is opened between the headnode and compute nodes (vice versa)

1-3) Yes
my mini cluster is fully functional

[master1@pbshost ~]$ qsub -I -l nodes=2:ppn=4
qsub: waiting for job 29.pbshost to start
qsub: job 29.pbshost ready

[master1@pbsnode2 ~] source /opt/intel/impi/5.0.2.044/bin64/mpivars.sh [master1@pbsnode2 ~] cd /mnt/nfs/
[master1@pbsnode2 nfs]$ mpirun ./a.out
Hello world: rank 0 of 8 running on pbsnode2
Hello world: rank 1 of 8 running on pbsnode2
Hello world: rank 2 of 8 running on pbsnode2
Hello world: rank 3 of 8 running on pbsnode2
Hello world: rank 4 of 8 running on pbsnode1
Hello world: rank 5 of 8 running on pbsnode1
Hello world: rank 6 of 8 running on pbsnode1
Hello world: rank 7 of 8 running on pbsnode1

The only question is why the pbs_dataservice does not start with the PBS server (service pbs start) and requires manual start

Thank

Nice one ! :+1:

  1. Please check your /etc/init.d/pbs script or systemd services of PBS Pro. , add a line to start it
  2. Backup and re-install and check.

Why /etc/init.d/pbs scriptis has ${PBS_EXEC}/sbin/pbs_dataservice stop, but does not have a pbs_dataservice start?
That is how it should be?
Is it called somewhere else or only manually?

Starting the PBS_SERVER service should bring up the pbs_dataservice automatically. If it is not doing in your case, then you can add logic to check whether pbs_dataservice is up and running, if not, then start the service in the start_pbs() function.