Hi,
I have setup pbs server in gpu1( pbsmaster ) and pbs mom in gpu5 ( pbsslave ) , but when i am trying to start mom in gpu5, i am getting below error messge. I was confused that i never set server name “node01".
Both of installation are following (openpbs/INSTALL at master · openpbs/openpbs · GitHub)
(gpu5)#qstat -B
Can not resolve name for server node01. (rc = -1 - Unknown error -1)
Cannot resolve specified server host ‘node01’.
qstat: cannot connect to server node01 (errno=15010) Access from host not allowed, or unknown host
and others info #cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
xxx gpu5
xxx gpu1
nodes have static IP addresses mapped to the hostnames and are resolvable from every machine in the PBS cluster (forward and reverse address resolution should work)
SELinux disabled (if you disable now, reboot the system) and Firewall disabled
/etc/hosts should contain all entries of hosts & aliases in the PBS Cluster (just in case if you loose the DNS server, then you are still safe with resolution) and this /etc/hosts file should be the same on all the participating systems of the pbs cluster
do not use dynamic IP address or loop back addresses to hostname of the compute nodes
15001 to 15009 and 17001 ports should be open for communication between server and nodes (vice versa)
you can use this command to check the name resolution
pbs_hostn -v # from server once and from compute node as well
Please share the output of the below commands as root user:
qmgr: print server
qmgr; print nodes @default
create node gpu5
set node gpu5 state = free
set node gpu5 resources_available.arch = linux
set node gpu5 resources_available.host = gpu5
set node gpu5 resources_available.mem = 65850888kb
set node gpu5 resources_available.ncpus = 20
set node gpu5 resources_available.ngpus = 4
set node gpu5 resources_available.vnode = gpu5
set node gpu5 resv_enable = True
Create and define node gpu1
create node gpu1
set node gpu1 state = free
set node gpu1 resources_available.arch = linux
set node gpu1 resources_available.host = gpu1
set node gpu1 resources_available.mem = 65705720kb
set node gpu1 resources_available.ncpus = 32
set node gpu1 resources_available.ngpus = 4
set node gpu1 resources_available.vnode = gpu1
set node gpu1 resv_enable = True
Qmgr: print server
Create resources and set their properties.
Create and define resource hpmem
create resource hpmem
set resource hpmem type = size
set resource hpmem flag = hm
Create and define resource ngpus
create resource ngpus
set resource ngpus type = long
set resource ngpus flag = hn
Create queues and set their attributes.
Create and define queue workq
create queue workq
set queue workq queue_type = Execution
set queue workq resources_default.ncpus = 1
set queue workq resources_default.nodect = 1
set queue workq resources_default.nodes = gpu5
set queue workq enabled = True
set queue workq started = True
Set server attributes.
set server scheduling = True
set server default_queue = workq
set server log_events = 511
set server mailer = /usr/sbin/sendmail
set server mail_from = adm
set server query_other_jobs = True
set server resources_available.ncpus = 16
set server resources_default.ncpus = 1
set server default_chunk.ncpus = 1
set server scheduler_iteration = 600
set server resv_enable = True
set server node_fail_requeue = 310
set server max_array_size = 10000
set server pbs_license_min = 0
set server pbs_license_max = 2147483647
set server pbs_license_linger_time = 31536000
set server eligible_time_enable = False
set server max_concurrent_provision = 5
set server max_job_sequence_id = 9999999
Please make sure you do not assign loop back address in the /etc/hosts for gpu1, gpu5 and node01
Make sure these nodes have static IP address mapped to the hostnames
If you run the command “hostname” on a compute node , you should use the output of hostname command to use as node name to create node in pbs . qmgr : create node hostname-of-the-node
#qstat -B
Can not resolve name for server node01. (rc = -1 - Unknown error -1)
Cannot resolve specified server host ‘node01’.
If you don’t have “node01” in /etc/pbs.conf, chances are you’re running torque’s “qstat”, not the OpenPBS one. do “which qstat” and confirm that indeed it’s the binary you think it should be.