Make scheduler to connect server and keep that connection persistent

@arungrover and @suresht, can you please provide your feedback on latest design?

Thanks for updating the design.

I have a few questions/comments -

  • It seems that you want scheduler to do a pbs_connect to the server and then send a RegisterSched batch request on that connection. Why don’t you make an IFL call to register sched? It would seem really odd for a client to encode and directly write request headers and other things into a connection, we have IFL calls for that purpose.
  • If we are creating a new batch request to register a scheduler then why don’t we make it in a way that we pass all necessary things as part of the request itself? Like in your case it will be scheduler name and connection-type (Primary/Secondary). I think it’s not a good idea to start using the extend field when we are already creating a new batch request. Extend should be used for extending an already established and published batch request.
  • Do we also send the scheduler version as part of the RegisterSched request?
  • If the scheduler is not able to connect to a server then for how long and at what interval will it retry?
  • When the Server rejects a RegisterSched request what is the error number it sends back?

True, we can make new IFL call for register sched, but should we make it? I don’t think so, because register sched will be used only by sched and no other client will use it ever. Also this is something special that we do not want to expose via IFL.
Another reason not expose in IFL is, generally client connects and ask server to do some task, where in this case, the client simply connects and waits for server to give some task to itself.

Sure, we can pass all details in request only, will update design.

No we will not sent it (although I need to update design for this), as PBS doesn’t support different versions of daemons.

It will retry every 2s till its gets connection to server (aka infinite time - reason without server connection sched don’t do anything). I will update this in design.

It depends on which place it fails, for example sched obj not found with given name then server will return PBSE_UNKSCHED, if connection on which req came is not same as sched_host on given sched then PBS_BADHOST, if sched is connected and someone tries to start another sched (with same name) then server will reject with PBSE_SCHEDCONNECTED, etc…

Since is something can change in future and more of code level details, I didn’t add it in design.
(If you want you can see more details here)

@arungrover updated design based on above.

Do we need to deprecate it first before obsoleting it ? Or else we need to take the feedback from the people whether it is indeed needed or not ?
Also As we are removing “-S” option totally how are we planning to upgrade from an environment having multiple schedulers running to the newer version ?

Do we log any message in scheduling logs to say that it is not able to reach server for every 2s. If so we also need to think a way to stop logging this messages when server takes very long time to come up, otherwise the machine’s disk gets full. May be we can have an option to say to turn off this logging on demand.

AFAIK, It was never supported, so I think no need of deprecating it. Moreover, whatever pbs_version sched was sending, server uses it just to display in stat, and for stat we can set pbs_version when sched obj created in server.

Well we can keep -S option for backward compatibility, so if admins have scripts which starts sched with -S option, it will still work, but sched will simply ignore it.
But right now as we are doing many architectural changes in master, I didn’t thought of upgrade part right now. We can take care of it when we need it (like month or 2 months before release, as will have more clear picture what to support for upgrade based on what all architectural changes went in master)

Right now I am thinking not to put any log for every retry. Anyways this is code level detail, we can take care of it in code review.

Can you please update the document with this information.

Thanks for making all the suggested changes. Design document looks good to me.

@suresht I am not sure what should I update in design? Do you want me to restore -S option and ignore it in sched?
or add info about that -S needs to take care for upgrade before release (I am not sure this we should put in design, I can create issue with so we don’t forget)?

Thanks @hirenvadalia, LGTM.

May be this is good enough. Thanks.

Thanks @arungrover and @suresht for reviews and very helpful QAs. @suresht I will create issue as soon as code changes goes in.

Hey @hirenvadalia,
I read through the design and have a couple of comments.

  1. There are a set of scheduling commands that really mean ‘run a scheduling cycle’. They can’t be any more useful than that because if several commands are triggered while there is a scheduling cycle happening, the server only keeps one. This means if a job was submitted, and then a job ended while a scheduling cycle was going on, the scheduler would only see that a job ended. It can’t trust these commands. This means we really need to do one of two things. We need to turn the sched commands into a bit field so we send all the commands we received, or we just collapse all of the commands into a single ‘run a scheduling cycle’ command. The only one which matters is the one which happens when the server just came up. There we do special things and run a scheduling cycle.

  2. This whole design is very scheduler specific. There has always been a problem with authenticating an IFL command as high permission. Ir would be nice to say, ‘this is like a PBS daemon’. There is a permission bit called SvWR. If an attribute has this bit, then it won’t be sent to a qstat -f, even if that qstat -f is root. The problem here is if we have scheduler like programs (pbs_est) or other high permission PBS commands. Right now I think this really is only pbs_est. The problem is that pbs_est needs all the attributes the scheduler gets, including the SvWR attributes. Right now the scheduler can’t mark any internal attributes as SvWR because pbs_est needs them. The best we can do is mark them manager only, and then they show up when a manager does a qstat -f. Now we really have the opportunity to fix this problem.
    Instead of making this very scheduler specific, we can create a method to mark a connection super high permissions like the scheduler. This would allow pbs_est to authenticate properly and work just like a scheduler.
    Currently for pbs_est we just need to have a way for it to mark its connection as the scheduler. Maybe give a special scheduler name which is know as ‘mark this connection with the permissions of a scheduler, even though it isn’t one’.

  3. Don’t forget the actual current reason we have the second connection when you go about using it. Once per pass through the main loop, the scheduler checks of a hook has requested that we restart the scheduling cycle.

  4. Instead of calling pbs_connect(), we can call pbs_connect_extend() directly and make use of the extend field. Would this mean we wouldn’t need the new batch request?

Bhroam

Thanks Bhroam for review.
For

  1. I agree, and I don’t know why we are multiple commands tell sched to do same thing ‘run a cycle’, but I am not sched expert, so I think we can take this later as separate design? Right now I have retained exact same behavior of how server sends commands to sched (except, now every cmds will go on secondary connection)
  2. I don’t know much about pbs_est, so not sure I understand what you are saying…
  3. I have retained behavior of reading super high prio cmd, in main_sched_loop(), do I need to take care of anything else?
  4. You can find answer in this reply

Hey @hirenvadalia

  1. True, this can be taken up in a different PR. I just figured you were dealing in this area, so it could be done now. Let’s put it off.
  2. The current difference between pbs_est and pbs_sched is who connects to whom. The server connects to the scheduler, so it is known as the scheduler. This means it has permission to any attribute flagged as SvWR. Now with pbs_est, it makes the connection to the server as root, and root it just a manager. It doesn’t have access to SvWR attributes since they require higher permission than manager. There are attributes which are really just internal to the server/scheduler, and should be marked as SvWR. We can’t mark these as such because pbs_est won’t have access to them and it will cause pbs_est to stop working. I’ve been waiting for this project to come along so we can fix this. The problem with your current design is it will only work for the scheduler, and not pbs_est. Can we make the design a bit more generic so pbs_est can connect as a scheduler?
  3. Nope, I just was worried about the restart command.
  4. I guess I don’t understand the internals here, but I’ll take your word for it that we need this. Thanks for the pointer.

Bhroam

  1. This should get automatically fixed, if we run other “scheduler-like clients” (say pbs_est) also as the same user-id as the one running the scheduler/server, and then such clients would have the same privileges as the scheduler, right? So, when we make the changes for non-root for pbs-server, we should just rely on the connected user being the daemon user (same userid as server), vs a different user who is either added managers group etc. and the access is completely based on that. Makes sense?
1 Like