Allow mom to join herself to the cluster without needing a qmgr command

Yeah i see that acl_hosts with wild card could be useful. Sill in the cloud scenario, most nodes u would get from a region on AWS, say, might be like “nodexxx.europe.aws.com”. That does not make for much security, since all europe region nodes would have that signature.

If we do subnet masks, it is better @mkaro - but would mean that nobody with a laptop to plugin the network should be able to get an IP in that subnet mask - that again is not very restrictive in cloud situation, right? In other words, on the cloud, other hosts on the same subnet could be alloted to other customers…?

Yeah, at exascale levels, node sorting based on some creation sequence almost does not make any practical sense!

One other angle i was hoping we can address is the security in inter-mom communications. We currently require that addresses of all the moms be transmitted to all other moms (IS_CLUSTER_ADDRS), and that hurts scalability. If possible, we would like use the same means to enhance that aspect of the communication security as well . As with the mom-server, two possible ways:

  1. Transmit the acl_hosts like mask (as discussed above) to the moms, so the moms know that they are to trust other moms in that subnet. (but in this case the benefit would be if we actually set a netmask and not a list of hostnames - which is what IS_CLUSTER_ADDRS is)

  2. Adopt a secret key based strategy (like one proposed above) - so if a receiver mom can decrypt a piece of text (say the sender mom’s hostname) using her copy of a secret key then they know that the sender mom can be trusted.

Which one do we like more?

Personally, I like the shared secret approach. I think it scales better and is easier to admin.

However, for security, the piece of text should include both some random string and the identifier (e.g. hostname). That makes it harder to crack the secret.

1 Like

NAS carries a local mod to the server and database so that nodes are returned from the server in an order set by the admins. This has two benefits. The first is cosmetic, which is what I suspect Bill is thinking of. Much of the time, nodes have a natural-to-humans order. It makes sense for pbs_statvnode() to list the nodes in their natural order, which makes pbsnodes display them in order without any special code.

Second, the scheduler also looks at nodes in the order provided by the server (after some rearranging based on node priority). If this order corresponds somewhat to hardware connectivity, then the scheduler will tend to select nodes for multi-node jobs near each other without any special effort by the scheduler. A cheap optimization.

I’m not sure how this plays out with sharding of nodes.

yeah with sharding of nodes, neither the scheduler nor the pbsnodes display may see them in the original order - the scheduler might need a node sorting applied to see them in any particular order.