PBS Pro (public version) Installation on Ubuntu 18.04.2 LTS (Bionic Beaver)

Hi All,

I am currently installing PBS Pro on my workstation. It seems it has been installed properly but it has still an issue regarding to permission and connection to the server. The corresponding log files are as follows:

The installation log:


*** PBS Installation Summary


*** Postinstall script called as follows:
*** /opt/pbs/libexec/pbs_postinstall ‘’


*** Existing configuration file found: /etc/pbs.conf


*** Saving /etc/pbs.conf as /etc/pbs.conf.pre.19.0.0.20190308094945
*** Replacing /etc/pbs.conf with /etc/pbs.conf.19.0.0
*** /etc/pbs.conf has been modified.
*** The original contents have been saved to /etc/pbs.conf.pre.19.0.0.20190308094945


*** Registering PBS Pro as a service.
Synchronizing state of pbs.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable pbs


*** PBS_HOME is /var/spool/pbs
*** Creating new file /var/spool/pbs/pbs_environment
*** WARNING: TZ not set in /var/spool/pbs/pbs_environment


*** The PBS Pro server has been installed in /opt/pbs/sbin.
*** The PBS Pro scheduler has been installed in /opt/pbs/sbin.


*** The PBS Pro communication agent has been installed in /opt/pbs/sbin.


*** The PBS Pro MOM has been installed in /opt/pbs/sbin.


*** The PBS commands have been installed in /opt/pbs/bin.


*** End of /opt/pbs/libexec/pbs_postinstall


The Start PBS log:

Starting PBS
PBS Home directory /var/spool/pbs needs updating.
Running /opt/pbs/libexec/pbs_habitat to update it.


*** Setting default queue and resource limits.


Connecting to PBS dataservice…connected to PBS dataservice@Ehsan
pbs_iff: cannot connect to host
pbs_iff: cannot connect to host
No Permission.
qmgr: cannot connect to server
Connection refused
qmgr: cannot connect to server
Connection refused
qterm: could not connect to server (111)
*** End of /opt/pbs/libexec/pbs_habitat
Home directory /var/spool/pbs updated.
/opt/pbs/sbin/pbs_comm ready (pid=22252), Proxy Name:ehsan:17001, Threads:4
PBS comm
PBS mom
Creating usage database for fairshare.
PBS sched
Connecting to PBS dataservice…connected to PBS dataservice@Ehsan
Licenses valid for 10000000 Floating hosts
PBS server

It would be great if you can give me your comments about the issue.

Thanks in advance for your time and attention.

Try the following steps

  • Some file permissions must be modified to add SUID privilege.
    sudo chmod 4755 /opt/pbs/sbin/pbs_iff /opt/pbs/sbin/pbs_rcp
  • Start the PBS Pro services.
    sudo /etc/init.d/pbs start

Dear kjakkali,
Thanks for your response. I have checked your points so far, but it did not work.

-rwsr-xr-x 1 root root 77704 Mar 7 20:53 pbs_rcp
-rwsr-xr-x 1 root root 1100760 Mar 7 20:53 pbs_iff

bash$ sudo /etc/init.d/pbs start
Starting PBS
/opt/pbs/sbin/pbs_comm ready (pid=871), Proxy Name:ehsanam:17001, Threads:4
PBS comm
PBS mom
Creating usage database for fairshare.
PBS sched
Connecting to PBS dataservice…connected to PBS dataservice@ehsanam
Licenses valid for 10000000 Floating hosts
PBS server

Hi,
Are you still getting following error while starting PBS.

or while executing any pbs commands like ‘pbsnodes -av’ ?

Dear kjakkali,

A the moment, I dont see any error when I start PBS and it says even it connected to dataserver@hotname. However, when I use a PBS command like qstat -B, I face a Error. The log file is as follows:

bash$ . /etc/profile.d/pbs.sh
bash$ sudo /etc/init.d/pbs start
Starting PBS
/opt/pbs/sbin/pbs_comm ready (pid=32097), Proxy Name:ehsan:17001, Threads:4
PBS comm
Creating usage database for fairshare.
PBS sched
Connecting to PBS dataservice…connected to PBS dataservice@Ehsan
Licenses valid for 10000000 Floating hosts
PBS server
bash$ qstat -B
Connection refused
qstat: cannot connect to server Ehsan (errno=111)

@ehsanam : One issue might be due to loop back ip-address.
Can you try adding non loop back IP address of the hostname entry into /etc/hosts file and try restart PBS?

Dear kjakkali,

Tnx again for your help. Now I can run the PBS commands using non-loop IP address. However, I face a problem regarding default queue:

bash$ echo “sleep 60” | qsub
bash$ qsub: No default queue specified.

I am wondering the default queue should be specified during installation or after installation?

@ehsanam : Good to know that.
Please run the pbs_postinstall (ex: libexec/pbs_postinstall) script which will create default queue.

@kjakkali As you suggested, I used the pbs_postinstall script and it solved the issue. Thanks a lot!
However, the job is not executed by qsub and its status is in Q mode. For example:

bash$ echo “sleep 60” | qsub
bash$ qstat -a

S = Q

Should I configure something else in PBS Pro?

@kjakkali

I guess it might be due to lack of resources. When I use pbsnodes -a command, it gives:

pbsnodes: Server has no node list

P.S: I am trying to install PBS Pro on my workstation which has 8 cores.

@kjakkali It got solved by:

root$ # source /etc/profile.d/pbs.sh
root$ # qmgr -c “create node hostname”

Dear ehsanam,
I also have the same problem. I don’t understand how to add non loop back IP address, can you show me more what exactly this means?
Here is my /etc/hosts
127.0.0.1 localhost
127.0.1.1 amax-server
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

Please find this example:
cat /etc/hosts

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
 192.168.100.101 pbsosserver.pbspro.org  pbsosserver 

If this does not help, please share your issues with /etc/hosts and PBS Pro OSS.

Dear adarsh,
Thank you very much for your help. By doing so, I run “qstat -B”, get the following result:
Server Max Tot Que Run Hld Wat Trn Ext Status


amax-server 0 0 0 0 0 0 0 0 Active
(1) **But I get the new error when I run: $ qmgr

ybwang@amax-server : ~ $ qmgr

Max open servers: 49

Qmgr: set queue batch queue_type = Execution

qmgr obj=batch svr=default: Unauthorized Request

qmgr: Error (15007) returned from server

Qmgr: set queue batch enabled = True

qmgr obj=batch svr=default: Unauthorized Request

qmgr: Error (15007) returned from server

Qmgr: set queue batch started = True

qmgr obj=batch svr=default: Unauthorized Request

qmgr: Error (15007) returned from server

Qmgr: set server default_queue = batch

qmgr obj= svr=default: Unauthorized Request

qmgr: Error (15007) returned from server

Qmgr:

(2) The error when I run “qsub”:
ybwang@amax-server : ~ $ echo “slepp 222” | qsub

qsub: No default queue specified

before run this, other output as following:

ybwang@amax-server:~$ sudo /opt/pbs/libexec/pbs_postinstall
[sudo] password for ybwang:
*** PBS Installation Summary


*** Postinstall script called as follows:
*** /opt/pbs/libexec/pbs_postinstall ‘’


*** Existing configuration file found: /etc/pbs.conf


*** Saving /etc/pbs.conf as /etc/pbs.conf.pre.19.0.0.20190926155700
*** Replacing /etc/pbs.conf with /etc/pbs.conf.19.0.0
*** /etc/pbs.conf has been modified.
*** The original contents have been saved to /etc/pbs.conf.pre.19.0.0.20190926155700


*** Registering PBS Pro as a service.
Synchronizing state of pbs.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable pbs


*** PBS_HOME is /var/spool/pbs
*** Existing environment file left unmodified: /var/spool/pbs/pbs_environment


*** The PBS Pro server has been installed in /opt/pbs/sbin.
*** The PBS Pro scheduler has been installed in /opt/pbs/sbin.


*** The PBS Pro communication agent has been installed in /opt/pbs/sbin.


*** The PBS Pro MOM has been installed in /opt/pbs/sbin.


*** The PBS commands have been installed in /opt/pbs/bin.


*** End of /opt/pbs/libexec/pbs_postinstall
ybwang@amax-server:~$ pbsnodes -a
amax-server
Mom = amax-server
ntype = PBS
state = free
pcpus = 24
resources_available.arch = linux
resources_available.host = amax-server
resources_available.mem = 32912876kb
resources_available.ncpus = 24
resources_available.vnode = amax-server
resources_assigned.accelerator_memory = 0kb
resources_assigned.hbmem = 0kb
resources_assigned.mem = 0kb
resources_assigned.naccelerators = 0
resources_assigned.ncpus = 0
resources_assigned.vmem = 0kb
resv_enable = True
sharing = default_shared
last_state_change_time = Thu Sep 26 15:47:23 2019

Sincerely looking forward to your reply, thanks a lot.

Please try these commands:

qmgr -c "create queue workq queue_type=e,enabled=t,started=t"

Run qmgr command and copy paste the below contents

set server scheduling = True
set server default_queue = workq
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server resources_default.ncpus = 1
set server default_chunk.ncpus = 1
set server scheduler_iteration = 60
set server flatuid = True
set server resv_enable = True
set server node_fail_requeue = 310
set server max_array_size = 10000
set server eligible_time_enable = False
set server job_history_enable = True
set server max_concurrent_provision = 5
set server max_job_sequence_id = 9999999

Share the output of the below commands

qstat -Bf
pbsnodes -aSj
qmgr -c 'p s'
echo “sleep 222” | qsub
qstat -answ1 

Hope this helps, thank you

Thank you very much for your kindness, adarsh.

**When I run the commands: ** , the error still there. It seems qmgr doesn’t work.

ybwang@amax-server : ~ $ qmgr -c “create queue workq queue_type=e,enabled=t,started=t”

qmgr obj=workq svr=default: Unauthorized Request

qmgr: Error (15007) returned from server

You are welcome and Thank you.
Please share the output of the below commands:

ps -ef | grep pbs_
cat /etc/hosts
ifconfig 
ping <pbs server hostname>
nslookup  pbs-server-hostname
nslookup  pbs-server-ipaddress
cat /etc/pbs.conf

Dear adarsh, here are the output:

(1) ybwang@amax-server : ~ $ ps -ef | grep pbs_

root 7857 1 0 17:18 ? 00:00:00 /opt/pbs/sbin/pbs_comm
root 7892 1 0 17:18 ? 00:00:00 /opt/pbs/sbin/pbs_mom
root 7904 1 0 17:18 ? 00:00:00 /opt/pbs/sbin/pbs_sched
root 8039 1 0 17:18 ? 00:00:00 /opt/pbs/sbin/pbs_ds_monitor monitor
postgres 8075 8055 0 17:18 ? 00:00:00 postgres: postgres pbs_datastore 192.168.0.8(36996) idle
root 8076 1 0 17:18 ? 00:00:00 /opt/pbs/sbin/pbs_server.bin
ybwang 8349 8286 0 17:28 pts/1 00:00:00 grep --color=auto pbs_

(2) ybwang@amax-server : ~ $ cat /etc/hosts

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4

::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.0.8 amax-server

#127.0.0.1 localhost

#localhost

#127.0.1.1 amax-server

The following lines are desirable for IPv6 capable hosts

#::1 ip6-localhost ip6-loopback

#fe00::0 ip6-localnet

#ff00::0 ip6-mcastprefix

#ff02::1 ip6-allnodes

#ff02::2 ip6-allrouters

(3) ybwang@amax-server : ~ $ ifconfig

enp6s0f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.0.8 netmask 255.255.255.0 broadcast 192.168.0.255
inet6 fe80::225:90ff:fe98:be68 prefixlen 64 scopeid 0x20
ether 00:25:90:98:be:68 txqueuelen 1000 (Ethernet)
RX packets 17528 bytes 1773726 (1.7 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 2917 bytes 361518 (361.5 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device memory 0xdfd20000-dfd3ffff

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10
loop txqueuelen 1000 (Local Loopback)
RX packets 8307 bytes 1290865 (1.2 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 8307 bytes 1290865 (1.2 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

wlx14cf92d8c354: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
ether 14:cf:92:d8:c3:54 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

(4) ybwang@amax-server : ~ $ ping amax-server

PING amax-server (192.168.0.8) 56(84) bytes of data.

64 bytes from amax-server (192.168.0.8): icmp_seq=1 ttl=64 time=0.033 ms

64 bytes from amax-server (192.168.0.8): icmp_seq=2 ttl=64 time=0.032 ms

64 bytes from amax-server (192.168.0.8): icmp_seq=3 ttl=64 time=0.032 ms

64 bytes from amax-server (192.168.0.8): icmp_seq=4 ttl=64 time=0.032 ms

(5) ybwang@amax-server : ~ $ nslookup pbs-server-hostname

Server: 127.0.0.53

Address: 127.0.0.53#53

** server can’t find pbs-server-hostname: SERVFAIL

(6) ybwang@amax-server : ~ $ nslookup pbs-server-ipaddress

Server: 127.0.0.53

Address: 127.0.0.53#53

** server can’t find pbs-server-ipaddress: SERVFAIL

(7) ybwang@amax-server : ~ $ cat /etc/pbs.conf

PBS_SERVER=amax-server

PBS_START_SERVER=1

PBS_START_SCHED=1

PBS_START_COMM=1

PBS_START_MOM=1

PBS_EXEC=/opt/pbs

PBS_HOME=/var/spool/pbs

PBS_CORE_LIMIT=unlimited

PBS_SCP=/usr/bin/scp

Thank you for this information

Please share the output of the below :
nslookup amax-server
nslookup 192.168.0.8

Please check

  • SELinux is disabled and system is rebooted , if it is disabled now
  • Firewall turned off and disabled ( 15001 to 15009 , 17001 ports are not blocked and open for communication )
  • 15007 is pbs_datastore port

Thank you

whoops,sorry for that.
(1) ybwang@amax-server : ~ $ nslookup amax-server

Server: 127.0.0.53

Address: 127.0.0.53#53

Non-authoritative answer:

Name: amax-server

Address: 192.168.0.8

ybwang@amax-server : ~ $ nslookup 192.168.0.8

8.0.168.192.in-addr.arpa name = amax-server.

Authoritative answers can be found from:

ybwang@amax-server : ~ $

(2.1) The system I used is Ubuntu1804, which should not install SELinux.

root@amax-server:~# /usr/sbin/sestatus -v

-su: /usr/sbin/sestatus: No such file or directory

(2.2) ybwang@amax-server : ~ $ sudo ufw status

Status: inactive

ybwang@amax-server : ~ $ sudo ufw allow 15001

Rules updated

Rules updated (v6)

Similar, I updated 15001-15009, and 17001 ports.