I just installed a fresh copy of PBS Pro 14.1.2. To utilize LBNL nhc, which runs on compute nodes, checks the system sanity and runs pbsnodes -o XXX
to mark node offline if failed. However I got “Error marking node node2 - Unauthorized Request” and “Reject reply code=15007, aux=0, type=9, from root@node2.localdomain” when running pbsnodes -o node2
with root@node2. I’ve tried to add node2 or node2.localdomain into /etc/hosts.equiv as well as setup ssh password-less access from node2 to main node, but none of them helps. Any ideas on that?
Could you please set the below and retry :
qmgr -c "set server managers+=root@" # use it with caution @
qmgr -c “set server flatuid=true”
I see. Is there any way to limit the access to specified host groups? (expect set root@each_node to be manager)
In my thought setting like this will result in any individual linux users can operate on PBS server with (actually their own) root identity. Using firewall won’t help, cause user can always use port forwarding to bypass it.
In Administrator Book 14.2, Section 7.3.13.1, I read this:
The value of flatuid also affects whether .rhosts and host.equiv are checked. If flatuid is True, .rhosts and host.equiv are not queried, and for any users at host2, only UserA is treated as UserA@host1. If flatuid is False, .rhosts and host.equiv are queried.
Any chance to achieve this by setting host.equiv?
It is better to set manager(s) with each of the compute nodes
for i in pbsnodes -av | ^[a-zA-Z]
;do qmgr -c “set server managers+=root@$i.localdomain”;done
If you do not want to set flatuid to true, then .rhosts and host.equiv should work.
You can set the flatuid to true and check the server attributes acl_host_enable and acl_hosts
If you do not want to set flatuid to true, then .rhosts and host.equiv should work.
I do have written node2
and node2.localdomain
into hosts.equiv
, but still gets Error marking node node2 - Unauthorized Request
.
You can set the flatuid to true and check the server attributes acl_host_enable and acl_hosts
For that I have around 80 compute nodes to manage, I prefer not to add each node to acl_hosts. Rather, I’m actually looking for some ‘external’ lists such as hosts.equiv file to add all nodes into it.
- When create nodes on the PBS Server based on the ‘hostname’ command output on the respective compute nodes
qmgr -c “create node node1”
or
qmgr -c “create node node1 Mom=node1.localdomain”
-
Are there multiple network adaptors on the headnode or compute nodes ?
-
Use the pbs_hostn -v
At the server, use the pbs_hostn command with the name of each host (compute node) in the complex. This should complain if hostname resolution is not working correctly. Check PBS Pro admin guide:2.16 pbs_hostn -
Please share your hosts.equiv file
main:~ # pbs_hostn -v node0
primary name: node0 (from gethostbyname())
aliases: main
aliases: node-mgmt
address length: 4 bytes
address: 10.10.10.85 (1426721290 dec) name: node0
main:~ # pbs_hostn -v node1
primary name: node1.localdomain (from gethostbyname())
aliases: node1
aliases: node1-eth0.localdomain
aliases: node1-eth0
address length: 4 bytes
address: 10.10.10.1 (17435146 dec) name: node1.localdomain
main:~ # pbs_hostn -v node2
primary name: node2.localdomain (from gethostbyname())
aliases: node2
aliases: node2-eth0.localdomain
aliases: node2-eth0
address length: 4 bytes
address: 10.10.10.2 (34212362 dec) name: node2.localdomain
main:~ # cat /etc/hosts.equiv
node2.localdomain
node2
main:~ # ip -4 a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq portid 0894ef5e946c state UP group default qlen 1000
inet 10.10.10.85/16 brd 10.10.255.255 scope global eth0
valid_lft forever preferred_lft forever
inet 10.0.0.254/24 brd 10.0.0.255 scope global eth0
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq portid 000000000317 state UP group default qlen 1000
inet XXX.XXX.XXX.XXX/23 brd XXX.XXX.XXX.XXX scope global eth1
valid_lft forever preferred_lft forever
4: tap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 100
inet 10.8.0.5/24 brd 10.8.0.255 scope global tap0
valid_lft forever preferred_lft forever
(eth1 is external interface, and tap0 is VPN interface)
main:~ # qmgr -c “p n node2”
#
# Create nodes and set their properties.
#
#
# Create and define node node2
#
create node node2 Mom=node2.localdomain
set node node2 state = free
set node node2 resources_available.arch = linux
set node node2 resources_available.host = node2
set node node2 resources_available.mem = 97607848kb
set node node2 resources_available.ncpus = 24
set node node2 resources_available.vnode = node2
set node node2 resv_enable = True
set node node2 sharing = default_shared
main:~ # grep node2 /etc/hosts
10.10.10.2 node2.localdomain node2 node2-eth0.localdomain node2-eth0
Thank you for this information.
- try to add the aliases to the host.equiv file
– please check whether the node name resolve and reverse resolve to same name. - try to open-up permissions to all in the host.equiv file and then slow restrict once you get it working
Reference: http://man7.org/linux/man-pages/man5/hosts.equiv.5.html
Thank you for you time. Here’s the result:
2. hosts.equiv is already set to 0644.
1. Set hosts.equiv to the following content (I’ve changed target from node2 to node22)
node22.localdomain
node22
node22-eth0.localdomain
node22-eth0
Then run with root: ssh node22 pbsnodes -o node22
, which returns: Error marking node node22 - Unauthorized Request
Thank you for your patience
-
could you please check /etc/hosts is fully populated with all the aliases is the same on the headnode and across all the compute nodes.
-
can we try to get it work by opening the security, to make sure it works with flatuid
-
if it is working with flatuid set to true, then we will unset this server attribute
-
host.equiv with the below conent
+ -
could you please let us know whether pbsnodes -o node22 works (when run from the server ) ?
Also, please qmgr -c "set node node22 state = offline " # this is the best practice than using pbsnodes -o -
worst case, would it be possible to sanitise the /etc/hosts to contain only the canonical names and try it out.
Please check the $PBS_HOME/comm_logs for any issues.
I can reproduce your scenario, i can successfully execute the below commands if and if, set serve rmanagers+=root@* , or else adding root@FQDN of the nodes.
- pbsnodes -o
- qmgr -c “set node state=offline”
even with hosts.equiv populated, i could not succeed.
To be sure enough, you mean you can’t succeed even with managers+=root@
and flatuid=false
and hosts.equiv added with proper hostnames?
Shall I wait or submit a bug report? Or try something more?
It works when managers+=root@ is set and when flatuid=false (or unset)
It works when managers+=root@ is set and when flatuid=true
if does not work when managers+= is not set + with flatuid set to true or false
Note:
The server’s flatuid attribute affects both when users can operate on jobs and whether users without accounts on the server host can submit jobs.
Seems no hope to keep security while enabling compute nodes to run qmgr set
?
Hello, it was mentioned toward the beginning of the thread that you must set the managers attribute to explicitly list the allowed account@host. This is an entirely separate mechanism from flatuid, which uses ruserok() (which consults hosts.equiv/rhosts) to deal with JOB authorization/submission/control capabilities. If you want accounts other that the root account on the server host to be able to control node/queue/etc. information they must be added as managers (or operators), no other way to do it.
@runapp, is adding all of the nodes to the managers list and leaving flatuid as the default value false (secure, using ruserok() calls) not working for you?