Allow mom to join herself to the cluster without needing a qmgr command

subhasisb · May 12, 2021, 5:39pm

Yeah i see that acl_hosts with wild card could be useful. Sill in the cloud scenario, most nodes u would get from a region on AWS, say, might be like “nodexxx.europe.aws.com”. That does not make for much security, since all europe region nodes would have that signature.

If we do subnet masks, it is better @mkaro - but would mean that nobody with a laptop to plugin the network should be able to get an IP in that subnet mask - that again is not very restrictive in cloud situation, right? In other words, on the cloud, other hosts on the same subnet could be alloted to other customers…?

subhasisb · May 12, 2021, 5:40pm

Yeah, at exascale levels, node sorting based on some creation sequence almost does not make any practical sense!

subhasisb · May 14, 2021, 9:24am

One other angle i was hoping we can address is the security in inter-mom communications. We currently require that addresses of all the moms be transmitted to all other moms (IS_CLUSTER_ADDRS), and that hurts scalability. If possible, we would like use the same means to enhance that aspect of the communication security as well . As with the mom-server, two possible ways:

Transmit the acl_hosts like mask (as discussed above) to the moms, so the moms know that they are to trust other moms in that subnet. (but in this case the benefit would be if we actually set a netmask and not a list of hostnames - which is what IS_CLUSTER_ADDRS is)
Adopt a secret key based strategy (like one proposed above) - so if a receiver mom can decrypt a piece of text (say the sender mom’s hostname) using her copy of a secret key then they know that the sender mom can be trusted.

Which one do we like more?

dtalcott · May 14, 2021, 3:36pm

Personally, I like the shared secret approach. I think it scales better and is easier to admin.

However, for security, the piece of text should include both some random string and the identifier (e.g. hostname). That makes it harder to crack the secret.

dtalcott · May 15, 2021, 4:31am

NAS carries a local mod to the server and database so that nodes are returned from the server in an order set by the admins. This has two benefits. The first is cosmetic, which is what I suspect Bill is thinking of. Much of the time, nodes have a natural-to-humans order. It makes sense for pbs_statvnode() to list the nodes in their natural order, which makes pbsnodes display them in order without any special code.

Second, the scheduler also looks at nodes in the order provided by the server (after some rearranging based on node priority). If this order corresponds somewhat to hardware connectivity, then the scheduler will tend to select nodes for multi-node jobs near each other without any special effort by the scheduler. A cheap optimization.

I’m not sure how this plays out with sharding of nodes.

subhasisb · May 16, 2021, 4:09am

yeah with sharding of nodes, neither the scheduler nor the pbsnodes display may see them in the original order - the scheduler might need a node sorting applied to see them in any particular order.

Topic		Replies	Views
PP-586: On a Cray X-series, create a vnode per compute node Developers	40	4822	January 10, 2017
Can not delete a vnode whose name prefix with #(hash) Users/Site Administrators	5	1825	April 17, 2018
Allow dots in PBS_MOM_NODE_NAME Developers	9	1182	June 12, 2020
New configuration variable: PBS_MOM_NODE_NAME Developers	25	5296	August 4, 2016
MOM sharing config Users/Site Administrators	8	4003	August 25, 2016

Allow mom to join herself to the cluster without needing a qmgr command

Related topics