Kerberos support

Actually, I think I need to go inside the tcp_read(). DIS_* routines call tcp_read() and tcp_read() reads as much data as possible (as much as tp buffer allowes). And this is the right place where the problem is. Once the data are read they are buffered incorrectly.

Vasek

Sounds good @vchlum. Let us go ahead with this and we can go over further details in the code review rounds. Thanks!

Hello @vchlum,

Just curious… do you think a PAM module for Kerberos might be an alternative to your approach?
https://www.eyrie.org/~eagle/software/pam-krb5/

Thanks,

Mike

Hey @mkaro,

Opposed to GSS-API, the PAM module ensures only the authentication. It would be much more work to provide similar features with PAM. GSS-API provides exactly what we need. The main reason for using GSS-API is the provided encryption. Since the user credentials are centralized (sent by the server to moms), we need some encryption between nodes. It would be very silly to send credentials through an unsecured channel. GSS-API provides both authentications and encryption. With PAM you would need to implement encryption on your own.

Vasek

Thank you for your response @vchlum. I now understand why PAM is not a viable approach.

Just short info. The chapter ‘How to set up PBS Pro with Kerberos support for testing purpose’ has been added to the EDD.

Vasek

Thanks Vasek,

I will take a look at the proposed design and will circle back with any questions/clarifications asap.

HI Vasek,

Here are few initial thoughts/feedback/clarifications about the EDD you had posted.

  1. For the cred renew tool, you have provided a link to a simple (sample) tool to use for testing. Do you think it is a good simple tool people might find useful if it was itself added to the PBS codebase (like somewhere in unsupported) - that way, users could use and update it if required? (this could be added to pbs with a README stating that this is a simpler tool and more elaborate functionality is available elsewhere)?

  2. Some of the new attributes are called: acl_krb_realm_enable, acl_krb_realm and that is in line with how current acls are named in PBS, so that’s fine. In that same way, should krb_realm_submit_acl instead be something like acl_krb_submit_realms? Also, do we really need two different realm lists, one to access the server and another a list of who can submit jobs? And if so, why is job submission any more special (say compared to job stats)?

  3. If cred_renew_enable is not set, what error message/behavior happens when the credentials expire: a) for a job about to run? b) for a running job?

  4. I am confused about cred_renew_period and cred_renew_cache. Why do we need cred_renew_cache? Also, since it is a time period, the name should have something like period, time etc to it, otherwise it sounds like it is the path to a cache.

  5. Should the attribute Job_Host be called “Submit_Host” instead? Job_host can confuse people as to whether it means submit or execution host.

  6. More of a question. Are there any security implications in placing the service and host principal caches in /tmp? As mentioned in sections about KRB5CCNAME and /tmp/krb5cc_pbs_client? In other words, are they better off in respective user’s home directories (or similar user access controlled area)?

Finally: Do you have updated code to look at? Or else, i can go over your existing code url. Thanks!

Thank you for your comments @subhasisb!

  1. I think ‘unsupported’ is probably the right place for this tool. We just need to warn admins against using it in production. I think the main issue of this tool is that every change of krb user password means that you need to update the keytab too. That is the problem of obtaining credentials from keytab in general and it is very unpleasant for using it in production.

  2. The reason for the name is historical. It makes sense to change it. I renamed krb_realm_submit_acl to acl_krb_submit_realms. The reason for using acl_krb_realms and also acl_krb_submit_realms is that the set of users/services that access the pbs and the set of users who are allowed submit may not be the same. With this feature, some special users/services that are not suitable for submitting jobs can access pbs using kerberos. Two examples:

    • You could have nodes in a different realm and you can demand to access pbsnodes or qstat from the nodes (using host credentials) but you probably don’t want to submit a job with host principal like ‘host/<fqdn>@PBSPRO.NODES’
    • You could have monitoring service using the pbsnodes or qstat in a special realm and the monitoring is probably also not suitable for submitting jobs.
  3. Now, cred_renew_enable works like this. If it is not set then:
    a1) If the cred_renew_tool is set: the job will start with valid credentials and only renewing of running jobs is not available.
    a2) If also the cred_renew_tool is not set: an error message is logged on the job run attempt, and the job remains in Q. The job will try to run in the next scheduler iteration again…
    b) Running job will not receive any new credentials. Once the credentials will expire, the credentials will not be renewed. e.g.: You could expect the job output fails with error while copying into its location and will remain on the computational node.
    Are you OK with this behavior?

  4. OK, I renamed cred_renew_period to cred_renew_cache_period. The cred_renew_cache_period is very important. It significantly relieves KDC load for bunch of jobs with the same user starting shortly one after another. Only the first job will ask for credentials and all the subsequent jobs of the same user will use the same credentials until cred_renew_cache_period allows it. (This is important for a renew tool that accesses the kdc - e.g.: our krb525_renew tool accesses the KDC database directly)

  5. Historical reasons. I agree and I changed it.

  6. Since the /tmp/ location is the default location for ccache in Kerberos, I do not see any trouble here, and I suggest to keep it this way. Of course, the ccache has the correct permission (600). See DEFCCNAME in https://web.mit.edu/kerberos/krb5-1.12/doc/mitK5defaults.html

Yes, please, use the same link for reading the code as before. Changes mentioned in this post are already included (all is squashed and rebased). The link is still the same: https://github.com/CESNET/pbspro/tree/kerberos_support_2 Just ignore code behind KRB525_FALLBACK macro (it is not active). It will be removed before raising PR.

Vasek

1 Like

Hi @vchlum

I looked at the updated design and it looks great to me. Thanks!
I will continue reviewing your code a bit more, since i am a bit concerned about all the changes that are made to tpp_dis.c. I am trying to think of how to keep tpp_dis.c largely unchanged and handle it at a higher level (like do_rpp() in the rpp/tpp communication side). The client<–>Server communication is all great, so I have no worries there.

I will try to think more and circle back. Thanks!

Sorry for taking so long to get back to this thread @vchlum. And thanks for being patient.

Finally, i was able to take another peek at the code.

I guess the primary reservation i have with the implementation is that, it seems to not ride on top of the communication layer, but it goes inside it. There are two problems that I see with that:

  1. In the client <–> Server communication, after the connection happens, the server code and client code are in a tight loop with each other establishing context. While it is fine for the client to in a loop without doing anything else, that sounds bad for the server. The pbsgss_server_establish_context() function is a tight loop that involves reading data from the client and send data back (as part of the handshake). This function loops within itself, without going through the event-based (wait_request()) loop of the server. This means that the server is not doing anything else when doing this. If the client is slow or hangs, then it has a potential to block the server from doing anything else.

  2. Functions like pbsgss_send_token() and recv_token() write data directly to the socket descriptor. Instead, if we sent a “message” via the DIS layer, then we would not need to make changes to the transport layer (TPP or TCP both). And the same code would work for TCP as well as TPP. Of course, this would mean inventing new message identifiers for the “batch requests” for sending each data structure, but those can all be under gss ifdefs.

Let me know what you think about this proposal. If i was not very clear, i apologize. Perhaps, we can get on the phone on a specified date/time?

Regards,
Subhasis

Thank you for your comments @subhasisb. I think I understand.

  1. The TCP implementation is this way because of historical reasons, and it makes sense to move on. I will fix it.

  2. I thought about your suggestion and I must admit it makes sense. If I understand you correctly, you have suggested setting functions dis_getc, dis_gets, dis_puts, disr_skip, disr_commit, and disw_commit to some new: gss_dis*. The gss_dis* functions will have their own buffer and will call the transport dis*. It should work for both TPP and TCP as you wrote. OK, I will change the implementation according to your suggestion.

I agree with you. Just a note that we still will not be able to encrypt the multicast messages.

Thanks,
Vasek

About 2, what you suggest might also work, but my suggestion was different, probably simpler. Let me try to explain.

So, the gss client need to do this exchange “dance” with the server. Similar to the hello based “handshaking” between the mom and the server, we could implement this at the application layer, instead of changing the transport layer.

We could add a batch identifier, say PBS_BATCH_GSS_HANDSHAKE

The message could look like this:

Header: PBS_BATCH_AUTH_HANDSHAKE

Protocol: GSS/TLS

state: Some_identifier to know the state of the handshake (BEGIN, STAGE1, STAGE2, ESTABLISHED…etc)

Encrypted? flag: 0/1

Gss_data (in future TLS data)

So, now the client side in the pbsgss_client_establish_context(), in the loop, when you call pbsgss_send_token(), basically this function can be modified to send a batch request to the server with the above structure. The server can then receive the batch request in its usual event mechanism in"wait_request()", dispatch to a “req_auth_handshake()” handler, which checks what state the handshake is in, and respond accordingly. Till the entire exchange is over, we can keep the client connection marked as “not authenticated” and so any other data is not acceptable from the client.

Since the above exchanges will be over batch protocols, the code will be the same between TCP or TPP. And we can even have encryption on the TPP messages also!

We can probably use such a thing for TLS in the future as well.

I have two comments.

  1. I think that you did not consider that we need to use the encryption also between moms, and there is no batch protocol between superior mom and sister moms. (we have IM protocol here) It is needed for multi-noded jobs. That is why I added GSS_HANDSHAKE (see net_connect.h).

  2. In my last post, I talked about doing encryption but you talked about the handshake. Establishing GSS context - the handshake is one thing and doing encryption with the GSS context like gss_wrap() or gss_unwrap() is another. Where do you suggest to do the encryption? I think the dis* layer (my last suggestion) is the highest possible layer.

Let’s meet on the skype and clarify. I already sent you an e-mail in order to arrange the meeting.

Vasek

You are right. I have mixed up two things the encryption and the handshake in my earlier message.

  1. Yes, i was mostly talking about the client to server messages. I think i need to go over the mom – mom implementation changes you have made once more to recall what’s there.

  2. I realize that the encryption has to sit as an intercept between existing messages and the transport layer, so, i agree, that the dis layer might be the highest possible one.

I will check the code more out before our meeting. Thanks!

Vaclav and myself had a small skype call, but it was very useful. Here is what we agreed:

  1. The GSS_HANDSHAKE code between moms (IM) and mom-server (IS) is already asynchronous, so it is all good. Changing the packet header from GSS_HANDSHAKE to some “generic HANDSHAKE” id, and adding GSS_HANDSHAKE as the next field might make it easier for our future TLS implementation. (but that’s almost a nitpick, so don’t need to fix absolutely)

  2. The client-server handshake could be like GSS_HANDSHAKE, instead of the loop in server. vaclav had already liked this idea, earlier itself.

  3. Instead of modifying the TPP routines, we could implement the dis_getc etc routines by switching the fucntion pointers. Thus, the data from application will get encrypted first and then they will pass it to the original functions (sort of the adaptor pattern).

Many thanks Vaclav.

All three points have been implemented. The work is on a new branch https://github.com/CESNET/pbspro/tree/kerberos_support_3 There is still some work to do… clean up, commenting and other improvements.

We also discussed on the skype following: Since the tpp handshake is asynchronous, some other data are sent immediately after the handshake begins and the data are not encrypted until the handshake is finished. This issue is resolved by adding an additional outcoming buffer now. This buffer withholds the stream of data until the handshake is finished. Once the handshake is finished, the withheld data are encrypted and sent.

The good news is that the withholding is done on the new dis_gss* layer and it is not necessary to modify the tpp layer.

The bad news is that I am still not able to disable the receiving of cleartext data because of the multicast. The multicast internally uses the same streams as an ordinary connection. I am obviously not able to encrypt the multicast, and since the multicast uses the ordinary connections, the multicast data are received in cleartext on a stream that should be expected as encrypted. AFAIK, I am not able to recognize that received data are from multicast and claim that other data must be encrypted. So all data are accepted (encrypted or cleartext). This is not good for security.

The solution could be in disabling the multicast. There is an option PBS_USE_MCAST for this, but this option affects only part of the multicast. It does not affect the ping or hook synchronization (and maybe more). @subhasisb already mentioned that ping will be removed. Are there any plans to improve the multicast in order to fully respect the PBS_USE_MCAST option? If yes, we could simply claim that using Kerberos without disabling multicast is highly unrecommended, and tie the disabling of receiving cleartext with the PBS_USE_MCAST option.

Vaclav

1 Like

Hi Vaclav,

Thanks for making all the changes we discussed about - they are looking great.

About the multicast, unfortunately, we need it to scale well for large sites, without that, the server/mom will send message serially one by one to each of the recipients. Hooks distribution, large multi-node job launches are going to be slower and taxing on the server/mom.

Looks like these following are our options:

  1. Honor PBS_USE_MCAST to not multicast when this is set to 0. But really, most large sites will want to turn on multicast. (but we can make this change in the code)

  2. Add a bit to the multi-cast packets (even when they become ordinary packets at the comm), so that recipients can know whether a packet is multicast or not. But then somebody could send a packet with this field as “0” and make the recipient accept it - assuming they can get past authentication.

  3. TPP multicast packets are broken into normal packets at the “last” comm - where the data then goes as “ordinary” packets to recipients; We could decrypt the mcast packet at this comm, break it down to individual packets (this already happens), and then encrypt these individual packets before sending them out to the recipients (moms). This would require that the comm has knowledge of the kerberos credentials for the respective user etc. and implementation can be complicated as well.

What do you think?

Subhasis

I see the pros and cons.

  1. It would be quite a pity if this feature wouldn’t be suitable for large sites. Yes, we can say that security is not for free.

  2. I think this is the same as ignoring this problem. Adding the bit will not improve security. Ignoring is actually also valid… The core reason why we need gss over tpp is that we send credentials through the channel and we need to secure this transport. So nobody will steal the credentials. This is already ensured.

  3. It is a good solution and it resolves the issue. The drawback is that we need another implementation of wrapping, unwrapping, and also the handshake. It is workable. I do not have a problem with the work, but I like the uniting of encryption in the dis_gss layer. And I think there are some parts of handshake that could be united with the existing solution too. But maybe this option is the best.

I have one more thought… not sure… but here is it.

  • Instead of adding the existing stream to the multicast pool we would open a separate stream for the multicast only. So we could safely tag one stream as encrypted and the second one as cleartext only, and we allow only pings and hooks to be received in cleartext. Is it feasible?

Vaclav

Hi Vaclav,

  1. So sounds like we want to rule out this option? Or do you think we should still try out this one? The problem with solution 1 is that we will need to test out both the MCAST enabled mode and MCAST disabled mode for every release, thus increasing QA work significantly.

  2. What if we just ignore the problem for now. As you said the security credential passing is already secure, so perhaps this is not a big deal anyway?

  3. Yes, i agree. This is workable but more work + not the neatest. I like your implementation in the dis_gss_layer.

  4. Your additional idea of the multi-cast as a separate rpp channel is definitely possible for the hook and pings. However, in case of multinode jobs, the mother-superior (the primary mom) will distribute join_job information to the sister moms and this is best done using multicast. If we take away multicast for that, then this becomes solution 1.

What do you think?