Allow mom to join herself to the cluster without needing a qmgr command

Here is the design for mom to create the natural node by itself without needing a qmgr command. Please provide your thoughts.

Looks good to me, just one thought: maybe we should title this as “remove need to create natural vnode via qmgr”, up to you.

I think this is a great idea. It is one less step to bring up a cluster. The moms know who they talk to, so why not create the vnode when they talk to the server. The mom already kind of does this. When using qmgr to create the vnode, it is really only a shell of a vnode. The mom sends a natural vnode later when it comes up with the rest of the info. This is just removing the step to create it via qmgr.

I have two comments

  1. I’d drop the ability to remove the vnode with pbs_mom -s. The natural vnode is required for a mom. You can’t have a mom without one. This is why we create it in PTL with 0 resources. We already have a way to delete a mom via qmgr.
  2. How does this affect things when there is a vnodedef file or an exechost_startup hook that creates vnodes? Will the natural vnode be created along side them? What if they create the natural vnode?

Bhroam

I agree with Bhroam, and will add to his second comment. How will this impact the cgroups hook which is also in the business of defining vnodes?

Thanks for all the feedback.

@bhroam ,

  1. The idea was to let mom delete herself as the mom is now allowed to add herself into the cluster. I am removing this as the user can achieve the same using qmgr and client commands are available alongside with mom package.
  2. vnodedef file will just keep the behaviour it has today. vnodedef file will have precedence over the natural node created using qmgr. After the proposed changes, vnodedef file will still have the precedence.

@mkaro
cgroup hooks will have the same precedence it has today over the natural vnode created using qmgr.

Interesting. Can you explain more about what the admin will have to do? An example would be helpful.
Or are you saying PBS will automatically figure it out based off the PBS_SERVER= entry in /etc/pbs.conf? (If that is the case, then it’s a great idea!)

Nit: Please add some text to say PBS_MOM_NODE_NAME is in the /etc/pbs.conf file

Please expand ang give specifics about this statement:

Admin needs to ensure the server and mom are in a secure environment.

How does the admin do this, and further, how does one prevent unauthorized hosts from adding themselves as execution hosts?

By ensuring a reasonable authentication setup. For example, on a cluster where a laptop can be plugged in and the laptop user can become root, the cluster should not use “reserved port” authentication. This is nothing special of course. If a user can become root on a cluster at will then it can be a security problem anyway

Agree. In fact the action of starting the mom and adding the node via qmgr are both done by the same actor typically, the admin, so eliminating an additional step is beneficial.

Most sties still use PBS reserved port authentication (it’s the default). Setting server acl_hosts has typically been used to address the situation you describe where a laptop can be plugged in and the laptop user can become root, but acl_hosts has always been only about limiting which hosts the server will respond to for client command requests, not about managing job execution hosts. That is, PBS today can run a job on an execution host from which a user cannot run qstat if it is not listed in acl_hosts (and acl_host_moms_enable was added in 18.2 to make it easier for added exec hosts to not need to be explicitly added to acl_hosts). Managing access in this way for execution hosts has not been a problem since the admin had to explicitly add them via qmgr.

With the proposed change in place what mechanism would such a site use to prevent a rogue laptop from adding itself as a compute node and having jobs sent to it?

Hmm ok. I thought most sites that use reserved ports and are serious about security do not allow addition of hosts where somebody else can have root access (I thought some of these sites even limit client access by making users ssh to a login node to submit jobs).

So, do we know how many sites actually depend on acl_hosts/acl_host_moms_enable? And, can they not just use a different authentication protocol like munge? (munge is easy enough to set up)

2 other possible solutions:

  1. Require that the mom identify itself with a special “secret” that the admin sets - that way the server knows whether a daemon was “admin blessed” or not. This could be a simple solution, like a simple secret (encrypted on disk just like db password) stored in mom_priv. (this can also allow us to remove cluster_addrs altogether). Rogue hosts will not have this and so can be rejected. But then, really, other authentication protocols like munge/TLS does exactly this in a better way! Still, this is simple enough and such a key is easily added to images (for cloud setups) and this can be a requirement only in the case of reserved ports.

  2. Requiring that admin adds the mom host name to such an acl_xxx switch sounds like will defeat the purpose of removing the need for the admin to create the node first. However, I feel it is still better than having to create the node first. We could have a switch that augments the reserved ports protocol to say allow root communication only from a list of hosts. If a daemon from that host connects to server and declared themself as a scheduler or a mom, then the server honors, else it rejects. We could also have a switch to require this and could be disabled by default, so sites who does not need this don’t have to bother?

But then again, we are just covering for weakness in the reserved port authentication by doing these things, where other protocols have solved this problem.

If we must do something, then I would prefer option (1) instead of (2) though

Is it not the case that the server returns node lists in the order the nodes are created? So, when the admin creates nodes via qmgr, the admin gets to decide what order they will be presented and searched in. With this proposal, the nodes end up listed forever in the order they happen to boot up. Ugly.

I second scc’s security concern about nodes being able to add themselves.

Also, we sometimes had test nodes on the same networks as production nodes. It was convenient to have the production server ignore the test nodes until they were fully configured, and vice versa.

Hmm, yeah, if that is a need then I guess the only way would be to create the nodes via qmgr first. We could continue to support creation of nodes via qmgr, but not require it, and then sites would be free to choose how they want to use it. For certain sites, the ability for the moms to autocreate nodes could be saving an extra step for each node to add!

But, if a site fixes the “security protocol” issue, then it is not a problem any more right? Either we “fix” the reserved port authentication by adding a “key” to the mom “image”, or we hope sites use better authentication methods supported in PBS (like munge) - no?

Could we not do this by offlining such nodes till they are ready?

Agreed with the security concerns.
Additionally, there are some sites that move hosts between complexes, and I could see how the same shared secret (or acl_host or munge secret) could be used on two different complexes and only the pbs.conf file determine the host to which she adds herself.
In cases like that, would we expect the mom to automatically remove herself from the old host as well?

Thinking about the current bar of establishing root to root trust when reserved port authentication is used for submission, PBS uses ruserok() to establish whether the account root (via pbs_iff) at the qsub host says ran qsub is allowed to act on behalf of that same account on the pbs_server host. Put another way, something (like hosts.equiv) was configured by root on the server host to say that the user namespace on the host in question is sane/trusted and the reported username can be trusted as accurate.

We do not do the same sort of check today when PBS adds a compute host to the cluster because root (or a manager designated by root) on the server host took explicit action via qmgr to say “I trust the user namespace on the host is not fraudulent, so it is ok to run jobs there”.

If we amended the proposal and added an ruserok() test to ask the system if root at the proposed execution host is allowed to act on behalf of local root before the server will trust/add a pbs_mom as an execution host, I believe we maintain the same level of security as we have gating job submission, which is also a user namespace trust issue.

DISCLAIMER: I may be being naive here, though, since I am not sure exactly what the second superuser argument to ruserok() does…

The server has to accept a piece of data encrypted with that key, so the server should have that key as well. So if we move a compute node between two complexes, then the other server (which may not have that key unless intended) would reject the advances from this compute node. Of course, if we actually want the node to autocreate itself on the other cluster too, then we could just copy the key to that server as well.

We do add some kind of a de-register from the mom to allow automatic deletion as well. For example, the mom detect a change in the server and upon restart it could attempt to connect one last time to the old cluster and remove itself from that cluster before (or after actually) adding to the new cluster. But is deletion actually required. At present the server no longer “pings” moms, rather the mom first initiates a connection to its target server, so if a mom does not connect back, the node stays “down” for that server anyway…

This can all be done, of course, only if a secret key is actually present in mom_priv, else we can default to the older ways.

Well perhaps ruserok with wild might be useful. Otherwise, this would end up replacing the qmgr -c “c node” operation with a change to acl_hosts. Does not buy us any benefit, does it?

Right, going from qmgr -c “c n node12345” to qmgr -c “s s acl_hosts += node12345” is not worth changing anything for by itself.

Perhaps a something like:

  • qmgr -c “s s acl_hosts += *.altair.com”

(Though, we might want to create a new attribute, if this syntax is new.). I’m not sure that would work for default Cloud naming of nodes, so, perhaps we’d need to also allow network addresses/masks too.

Regarding the existing semantics where the order nodes are added is the order they are output/displayed (part of Dale’s comments) – this reduces implementation freedom and creates a potential scalability issue (with thousands of nodes coming and going); I feel “sorts” should be handled outside the server.

1 Like

Perhaps the subnet and netmask would also help?

qmgr -c "s s acl_hosts += “10.10.25.0/24”