Hello PBS Pro Community,
I have a pretty simple network, with FQDNs pinging and resolving fine.
However, when I try adding a node to PBS pro, it is marked state unknown and is looks like this:
Mom = node0075.x.y
Port = 15002
pbs_version = unavailable
ntype = PBS
state = state-unknown,down
resources_available.host = node0075
resources_available.vnode = node0075.x.y
resources_assigned.accelerator_memory = 0kb
resources_assigned.mem = 0kb
resources_assigned.naccelerators = 0
resources_assigned.ncpus = 0
resources_assigned.netwins = 0
resources_assigned.vmem = 0kb
resv_enable = True
sharing = default_shared
Any thoughts on how to resolve this?
Thanks for the pointers but I still notice the same issue after simplifying my setup by using just /etc/hosts file. There is no firewall or SELinux and mom is up:
03/26/2018 18:17:09;0002;pbs_mom;Svr;pbs_mom;Mom pid = 33736 ready, using ports Server:15001 MOM:15002 RM:15003
/etc/pbs.conf on the mom node looks like this:
And /etc/pbs.conf on the server node looks like this:
Do you see something I might be missing? Or does PBS have any probing tools that might help identify the issue?
Please change the /etc/pbs.conf on the compute node to below and restart the pbs mom services
Thanks for the tip - made the modification but it didn’t resolve the issue.
Here’s what my mom config looks like:
[root@node0075 ~]# cat /var/spool/pbs/mom_priv/config
There isn’t any server_priv/config…should there be one?
Server hostname is present in the /etc/pbs.conf and mom_priv/config and not in other locations.
Sorry to see that it is not working for you, the deployment is quite straight forward.
Could you please check (via telnet) ports 15001 to 15009 and 17001 is open between headnode and compute node ?
Please share the output of the below commands ( run them on server node and compute node separately)
- cat /etc/hosts
- pbs_hostn -v < server hostname >
- pbs_hostn -v < compute node hostname >
- ping server-hostname
- ping computenode-hostname
- netstat -tunap | grep pbs