Can not resolve name for server node01. (rc = -1 - Unknown error -1)

Hi,
I have setup pbs server in gpu1( pbsmaster ) and pbs mom in gpu5 ( pbsslave ) , but when i am trying to start mom in gpu5, i am getting below error messge. I was confused that i never set server name “node01".
Both of installation are following (openpbs/INSTALL at master · openpbs/openpbs · GitHub)

(gpu5)#qstat -B
Can not resolve name for server node01. (rc = -1 - Unknown error -1)
Cannot resolve specified server host ‘node01’.
qstat: cannot connect to server node01 (errno=15010) Access from host not allowed, or unknown host

and others info
#cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
xxx gpu5
xxx gpu1

#cat /etc/pbs.conf
PBS_SERVER=gpu1
PBS_START_SERVER=0
PBS_START_SCHED=0
PBS_START_COMM=0
PBS_START_MOM=1
PBS_EXEC=/opt/pbs
PBS_HOME=/var/spool/pbs
PBS_CORE_LIMIT=unlimited
PBS_SCP=/bin/scp

(gpu1)#pbsnodes -v gpu5
gpu5
Mom = gpu5
ntype = PBS
state = free
pcpus = 20
resources_available.arch = linux
resources_available.host = gpu5
resources_available.mem = 65850888kb
resources_available.ncpus = 20
resources_available.vnode = gpu5
resources_assigned.accelerator_memory = 0kb
resources_assigned.hbmem = 0kb
resources_assigned.mem = 0kb
resources_assigned.naccelerators = 0
resources_assigned.ncpus = 0
resources_assigned.vmem = 0kb
resv_enable = True
sharing = default_shared
license = l
last_state_change_time = Tue Feb 9 01:00:32 2021
last_used_time = Tue Feb 9 00:39:46 2021

Any suggestions are appreciated.

Please make sure

  • nodes have static IP addresses mapped to the hostnames and are resolvable from every machine in the PBS cluster (forward and reverse address resolution should work)
  • SELinux disabled (if you disable now, reboot the system) and Firewall disabled
  • /etc/hosts should contain all entries of hosts & aliases in the PBS Cluster (just in case if you loose the DNS server, then you are still safe with resolution) and this /etc/hosts file should be the same on all the participating systems of the pbs cluster
  • do not use dynamic IP address or loop back addresses to hostname of the compute nodes
  • 15001 to 15009 and 17001 ports should be open for communication between server and nodes (vice versa)
  • you can use this command to check the name resolution
    pbs_hostn -v # from server once and from compute node as well

Please share the output of the below commands as root user:
qmgr: print server
qmgr; print nodes @default

Thank you adarsh for your reply.

Qmgr: print nodes @default

Create nodes and set their properties.

Create and define node gpu5

create node gpu5
set node gpu5 state = free
set node gpu5 resources_available.arch = linux
set node gpu5 resources_available.host = gpu5
set node gpu5 resources_available.mem = 65850888kb
set node gpu5 resources_available.ncpus = 20
set node gpu5 resources_available.ngpus = 4
set node gpu5 resources_available.vnode = gpu5
set node gpu5 resv_enable = True

Create and define node gpu1

create node gpu1
set node gpu1 state = free
set node gpu1 resources_available.arch = linux
set node gpu1 resources_available.host = gpu1
set node gpu1 resources_available.mem = 65705720kb
set node gpu1 resources_available.ncpus = 32
set node gpu1 resources_available.ngpus = 4
set node gpu1 resources_available.vnode = gpu1
set node gpu1 resv_enable = True

Qmgr: print server

Create resources and set their properties.

Create and define resource hpmem

create resource hpmem
set resource hpmem type = size
set resource hpmem flag = hm

Create and define resource ngpus

create resource ngpus
set resource ngpus type = long
set resource ngpus flag = hn

Create queues and set their attributes.

Create and define queue workq

create queue workq
set queue workq queue_type = Execution
set queue workq resources_default.ncpus = 1
set queue workq resources_default.nodect = 1
set queue workq resources_default.nodes = gpu5
set queue workq enabled = True
set queue workq started = True

Set server attributes.

set server scheduling = True
set server default_queue = workq
set server log_events = 511
set server mailer = /usr/sbin/sendmail
set server mail_from = adm
set server query_other_jobs = True
set server resources_available.ncpus = 16
set server resources_default.ncpus = 1
set server default_chunk.ncpus = 1
set server scheduler_iteration = 600
set server resv_enable = True
set server node_fail_requeue = 310
set server max_array_size = 10000
set server pbs_license_min = 0
set server pbs_license_max = 2147483647
set server pbs_license_linger_time = 31536000
set server eligible_time_enable = False
set server max_concurrent_provision = 5
set server max_job_sequence_id = 9999999

And it’s ok for this?

#pbs_hostn -v gpu1
primary name: gpu1 (from gethostbyname())
aliases: localhost.localdomain
aliases: localhost4
aliases: localhost4.localdomain4
address length: 4 bytes
address: 127.0.0.1 (16777343 dec) name: gpu1
address: xxx.xxx.xxx.xxx (2390671834 dec) name: gpu1

Thanks,
Jeff

Thank you @jeff14 for sharing these details

  1. Please make sure you do not assign loop back address in the /etc/hosts for gpu1, gpu5 and node01
  2. Make sure these nodes have static IP address mapped to the hostnames
  3. If you run the command “hostname” on a compute node , you should use the output of hostname command to use as node name to create node in pbs . qmgr : create node hostname-of-the-node

#qstat -B
Can not resolve name for server node01. (rc = -1 - Unknown error -1)
Cannot resolve specified server host ‘node01’.

If you don’t have “node01” in /etc/pbs.conf, chances are you’re running torque’s “qstat”, not the OpenPBS one. do “which qstat” and confirm that indeed it’s the binary you think it should be.

Thank you Alexis !
It actually is torque’s “qstat”. :joy: