Qstat: cannot connect to server (Single node cluster)

Dear All,

we tried to install PBS 19.0.0 on our single node GPU Ubuntu cluster.
The installation did not cause any problems, we followed exactly the steps in this description (https://github.com/PBSPro/pbspro/blob/master/INSTALL#L99).

However, when we execute the the qstat command at the the end, we get the following error message:

Connection refused
qstat: cannot connect to server DL-Box (errno=111)

Our pbs.conf look like this:

PBS_SERVER=DL-Box
PBS_START_SERVER=1
PBS_START_SCHED=1
PBS_START_COMM=1
PBS_START_MOM=1
PBS_EXEC=/opt/pbs
PBS_HOME=/var/spool/pbs
PBS_CORE_LIMIT=unlimited
PBS_SCP=/usr/bin/scp

When we execute the command pbs_hostn -v dl-box:

primary name: DL-Box (from gethostbyname())
aliases:            -none-
     address length:  4 bytes
     address:            127.0.1.1   (16842879 dec)  name:  DL-Box

Our /etc/hosts file looks like this:

127.0.0.1       localhost
127.0.1.1       DL-Box

Stopping and restarting does not seem to have any effect, the PBS services don’t seem to start up correctly:

/etc/init.d/pbs status

pbs_server is not running
pbs_mom is not running
pbs_sched is not running
pbs_comm is not running

When I print the log file I get the following output:

Comm@dl-box;Svr;Log;Log opened
Comm@dl-box;Svr;Comm@dl-box;pbs_version=19.0.0
Comm@dl-box;Svr;Comm@dl-box;pbs_build=mach=N/A:security=N/A:configure_args=N/A
Comm@dl-box;Svr;Comm@dl-box;hostname=dl-box;pbs_leaf_name=N/A;pbs_mom_node_name=N/A
Comm@dl-box;Svr;Comm@dl-box;ipv4 interface lo: localhost
Comm@dl-box;Svr;Comm@dl-box;ipv4 interface eth0: GeForceGTX1080Ti.edo.test.go.jp
Comm@dl-box;Svr;Comm@dl-box;ipv6 interface lo: ip6-loopback
Comm@dl-box;Svr;Comm@dl-box;/opt/pbs/sbin/pbs_comm ready (pid=9076), Proxy Name:dl-box:17001, Threads:4
Comm@dl-box;TPP;alloc_router(Main Thread);Failed to resolve address, pbs_comm=dl-box:17001
03/11/2019 11:17:23;0001;Comm@dl-box;Svr;Comm@dl-box;main, tpp init failed

How can we resolve this problem?

Thank you for your help,
Yussuf

1 Like

@yali Can you try adding IP-Hostname entry into /etc/hosts file and try restart PBS? IP should be non-loopback address…

1 Like

Dear hirenvadalia,

thank you for your help! This seems to solve the problem!

One more question regarding the /etc/hosts. Is it ok to have both entires loopback and non-loopback? Or should I delete the loopback entry completely?

127.0.0.1       localhost
127.0.1.1       DL-Box
134.54.140.95    DL_-Box

@yali AFAIK if you are adding non-loopback address for specific hostname then loopback address for that hostname doesn’t make sense… but regarding PBS I think it should work with keeping loopback entry along with non-loopback address entry

You rescued me after two-day endeavour!

2 Likes

Hello. Our HPC server has been working correctly I just restart the system and now when I try pbsnodes -l , it says
pbsnodes: cannot connect to server master1.local, error=111

Please advice