PBS dataservice not running

Hi, Can any one have idea on this?
root@kmaster1 datastore]# tail /var/spool/pbs/server_logs/20190416
04/16/2019 11:59:21;0002;Server@kmaster1;Svr;Server@kmaster1;PBS dataservice not running:[Connection: failed: could not connect to server: Connection refused
Is the server running on host “192.168.1.224” and accepting
TCP/IP connections on port 15007?]
04/16/2019 11:59:22;0002;Server@kmaster1;Svr;Server@kmaster1;pbs_status_db exit code 1
04/16/2019 11:59:32;0002;Server@kmaster1;Svr;Server@kmaster1;Starting PBS dataservice
04/16/2019 11:59:43;0002;Server@kmaster1;Svr;Server@kmaster1;PBS dataservice not running:[Connection: failed: could not connect to server: Connection refused
Is the server running on host “192.168.1.224” and accepting
TCP/IP connections on port 15007?]
04/16/2019 11:59:43;0002;Server@kmaster1;Svr;Server@kmaster1;pbs_status_db exit code 1
04/16/2019 11:59:53;0002;Server@kmaster1;Svr;Server@kmaster1;Starting PBS dataservice
[root@kmaster1 datastore]#

I am using PBSPRO Open Source
wget -c http://wpc.23a7.iotacdn.net/8023A7/origin2/rl/PBS-Open/pbspro_19.1.1.centos7.zip

[root@kmaster1 ~]# service pbs restart
Restarting PBS
Stopping PBS
Killing Server.
PBS server - was pid: 4956
PBS sched - was pid: 3583
PBS comm - was pid: 3322
Waiting for shutdown to complete
Starting PBS
PBS comm
/opt/pbs/sbin/pbs_comm ready (pid=5310), Proxy Name:kmaster1.calligotech.com:17001, Threads:4
Creating usage database for fairshare.
PBS sched
Connecting to PBS dataservice…Failed to start PBS dataservice
.Failed to start PBS dataservice
…Failed to start PBS dataservice
continuing in background.
PBS server
[root@kmaster1 ~]#

and i tested psql connection, as shown below
[root@kmaster1 datastore]# psql -h 192.168.1.224 -U postgres
psql: could not connect to server: Connection refused
Is the server running on host “192.168.1.224” and accepting
TCP/IP connections on port 5432?
[root@kmaster1 datastore]#

The data service was not started along with the pbs.
How about providing more info, such as OS version, PBS pro version, etc.?

Also check whether firewall services are not blocking port 15001 to 15007 and SELinux is disabled (and system was rebooted). The service user account should have a home directory ( “pbsdata” user account)

Dear Adarsh,

I have disabled firewall and selinux .
still i am getting same error message pbs/server_log .ie pbs dataservice not running.

can you please let me know how to create “pbsdata” user account and run pbs dataservice.

Regards,
Zain

  1. un-install the existing pbs deployment
  •     rpm -qa | grep pbs | xargs rpm -e 
    
  •     ps -ef | grep pbs_  # make sure there are no zombiles left
    
  •     rm -rf /var/spool/pbs . /opt/pbs   /etc/pbs.conf .  /etc/init.d/pbs  /etc/profile.d/pbs.sh
    
  1. disable SELinux and reboot the system
  2. disable firewalld services
  3. create pbsdata user account
    useradd -m -d /home/pbsdata -s /bin/bash -c “PBS datastore service user” -U pbsdata
  4. wget -c http://wpc.23a7.iotacdn.net/8023A7/origin2/rl/PBS-Open/pbspro_19.1.1.centos7.zip
  5. unzip *.zip ; cd pbspro-server-19.1.1-0.x86_64; yum install pbspro-server-19.1.1-0.x86_6.rpm
  6. /etc/init.d/pbs start or systemctl start pbs

hope this helps

Thanks for your help … i am able to run pbs server and added compute node also.

can you please help me know how do i change PBS_EXEC directory path while installing (yum install pbspro-server-19.1.1-0.x86_6.rpm) pbs server. By default PBS_EXEC path is /opt/pbs.

Actually, i am installing Pbspro CE on Cluster Environment. So, i want to make pbs_exec path as /share/apps/pbs.

And, when i test the job from ‘pbsdata’ user, my job is going on hold.
[pbsdata@kmaster1 ~]$ echo sleep 7 | qsub

1.kmaster1

[pbsdata@kmaster1 ~]$ qstat -ans

kmaster1:
Req’d Req’d Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time


0.kmaster1 pbsdata workq STDIN – 1 1 – – H –

job held, too many failed attempts to run
1.kmaster1 pbsdata workq STDIN – 1 1 – – H –

job held, too many failed attempts to run

I have tried to remove the job and rerun, still the job is going on HOLD state. Can you please help me on this.

PBS_SERVER=<server name> PBS_HOME=<new home location> rpm -i --prefix <new exec location> pbspro-
<sub-package>-<version>-0.<platform-specific-dist-tag>.<hardware>.rpm

Please check the below section from this document High-performance Computing (HPC) and Cloud Solutions | Altair
3.4.2.2 Setting Installation Parameters

Reasons for job in “H” state:

  • If there is an issue with authentication of the user on the compute node

  • or user home directory not mounted or home directory of the user not available on the compute nodes or user is not passworded on that compute node

  • not sure about the users authentication PBS keeps the job in held state

  • when the job is manually put on the hold state using qhold command

  • If the job is a dependent job ( in the dependency chain of jobs )

Check the mom logs of the job where it was scheduled to run ( you can get this by running tracejob )

Thank you