Make scheduler to connect server and keep that connection persistent

Hi,

I would like to proposes the follows for scheduler connection with server:

  1. Change the direction of connection initiation: Currently the server connects to the scheduler. However, this means the scheduler has to listen on a port and accept connections. On the other hand the server already does that very well, so we can simply reverse the direction of connection such that scheduler acts as a client (just like moms or IFL clients) and connects to the server after proper authentication.
  2. Make the scheduler connections persistent. There is no need to disconnect the connection between the server and scheduler just to indicate the end of a cycle, for which a message can be sent. That is less expensive than setting up and tearing down TCP connections. Besides, keeping a connection allows the scheduler to remain connected to all the servers irrespective of which server wanted to start the scheduling.

here is design page.

here is PR link.

So please review design and provide your feedback.

Thanks

@hirenvadalia Thanks for writing the document.

Can you please elaborate on your design and write about the motivation behind this change.

  • Is sched registration batch request a new IFL call? Please add more details about it.
  • Your second point of overview suggests that the scheduler is going to connect to multiple servers. Does that mean that the scheduler is going to maintain a pair of connected sockets for each server?
  • How will it work in a failover setup?
  • Can you please talk a little more about authentication? Earlier, the server used to connect to the scheduler on “sched_port” but now in your proposal scheduler is connecting to the server, how does the server know if the client connecting is a scheduler or not?
  • With this new change, do we even need attributes like sched_port on the scheduler object? If not, can you update the design to mention it?

Thanks,
Arun

@arungrover Thanks for review

Well I added this info in overview section, but please let me know if anything missing…

No, its not new IFL call, its just new batch request (currently in my PR I named it PBS_BATCH_RegisterSched, but if needed we can change its name)

Yes, that’s true, in future sched will have one pair of sockets for each servers in case of multiple servers.
Right now I have kept such a way that it will have only one server in servers list and one pair of sockets for that server, while keeping code for almost future ready for multiple servers.

Right now I have not done anything special for failover assuming pbs_connect() connects to appropriate active server and it will switch to appropriate active server to when failover happens… but will test and fix if needed…

I explained this in “How server promotes client as scheduler” section. please let me know if its not clear enough to understand, will change it.

Yes, with new changes, we don’t need sched_port at all, I have removed it. will update the design.

Thanks,
Hiren

@hirenvadalia, Thanks for the design document.

Can you please elaborate on the following.

  1. It looks like you are not sending any end of cycle indication to Server which means Server/s keeps on writing scheduling commands over the wire/network. Doesn’t this cause performance problem ? As of today Server does not send more than one scheduling command until it gets end of cycle.
  2. As of today the priority of scheduling commands is handled by the Server. This change breaks the priority handling especially with respect to medium priority and normal priority commands. Super high priority commands are still fine since they are sent on a secondary connection. For example SCH_CONFIGURE should always have the high priority before the other commands like SCH_SCHEDULE_NEW.

Just eager to know why do we need a new batch request. Why can’t we use the existing batch request that is used for pbs_connect ? Since we are anyways using extend parameter to differentiate whether it is connection from scheduler or not I feel we don’t need another batch request.

I think you are now passing “scheduler name + primary/secondary” to extend parameter. We also need to pass sched_port to the above parameter. This helps in case of Multisched.

Also I think we still need sched_port when somebody is creating a new scheduler or if scheduler is running in a non default port which is generally the case with Multisched.

@suresht We need new BR because before server can accept expend parameter with sched name and mark client as sched, client (aka scheduler) has to authenticate (connection authentication, not itself as sched) and during pbs_connect() only req_connect() (aka handling of PBS_BATCH_Connect) has access to given extend, and in req_connect() connection is just initiated not authenticated, once server replies to pbs_connect() request we loose given extend, so this BR is mainly for passing extend (for now, in future we can add more info also like what we pass in update_svr_schedobj()) and authenticating client as sched.

@suresht with this design, sched is connecting to server as normal client (like qmgr, qstat etc…) and using new BR it will register itself as sched, so like normal client, sched will have some random port auto assigned to it.

Now in multisched, each sched will have diff random port auto assigned, but will never have same sched name among multiple schedulers, so not need of port to differentiate between multiple scheds, just sched name will suffice

In case of Multisched Scheduler’s host and port are decided by the Admin they are not auto assigned at Scheduler. It is due to this reason it would be good if we validate both scheduler’s name + port. This acts as an additional security and we are also making sure the Scheduler is indeed connecting on the port that is configured by the Admin

We can actually store the extend info if the connection is from Scheduler and use it later right ? Also it would be better if you can add some sort of information about the new BR like what information is sent etc to the document.

@suresht no we don’t store extend in conn, we just mark CONN_PRIMARY or CONN_SECONDARY and that after client is proved itself as scheduler (aka after receiving register sched BR on both connection opened by sched).

Sure I will update document with all pending details soon…

Yes true @suresht, right now port and host is decided by Admin, because right now sched listen only on one port and server needs to know that port so that server can connect to it.

But with this changes, server doesn’t connect to sched anymore, but its reverse, so server doesn’t need to know port, and sched is free to chose any port while making connection with server.
Means, now we can get rid sched port, so one less thing to take care by admin + one less thing to maintain by server/sched.

Thanks @hirenvadalia. This means we are removing the option “-S” of pbs_sched binary which is used to specify the port number. Similarly we are removing the option sched_port of Scheduling object at the Server. These are like external interfaces. It would be better if we document them clearly. We might also need to think of upgrade scenario if there is any impact especially in the case of switching from one version of PBS which is configured with multiple schedulers to the latest version.

Yes @suresht, I will add this, initially I didn’t add because I was seeking feedback for how server marks client as sched… anyways i will update design and add all pending details.

Sure, will see as we progress on this design…

Hey @suresht, sorry I missed this part.
There will be no change in how server sends command to scheduler including priority commands.
Except two thing, currently scheduler closes connection to notify server that sched cycle is ended. But with new design were we need to keep connection persistent we can’t close connection to notify server, so sched will send cycle end notification on secondary connection.

Also currently super high priority command is send over primary connection while other commands is send over secondary connection, but primary connection is mainly for IFL calls (aka data connection).
So with this design, now all commands including super high priority command will be send only on secondary connection and primary connection will be used only for IFL calls.
In other words, primary connection is only for data/IFL and secondary connection is only for sched commands from server + end cycle notification from sched

Will update above in design page…

Thanks @hirenvadalia for addressing the comments and going to update the design document with all of the above things that we discussed.

In the master as of today we send super high priority commands on secondary connection.

@suresht and @arungrover, I have updated design will all pending details. Please have look and let me know you thoughts…

As just discussed offline, I misunderstand part earlier when we discussed about sched commands. Will update design with correct information…

I have updated design with correct information. Please have look and let me know your thoughts…