Multi-server switches and sharding algorithm design

Hello All,

We have started our work on the PBSPro multi-server project. As part of multi-server architecture, the pbs.conf file switches used to control the configuration and the sharding algorithm to choose the right server is designed and explained in detail at https://pbspro.atlassian.net/wiki/spaces/PD/pages/1585545237/Sharding+algorithm+design

We are expecting a sensible feedback. Thanks in advance!

Thanks @bremanandjk for sharing. A few questions…

How does PBS_SERVER_INSTANCES work for a mom on a remote machine? Does it has to specify server_name:port_number or will it guess from the already existing parameters such as PBS_SERVER? If the conf file contains server_name:port_number, do we have to give PBS_SERVER parameter explicitly? If it makes use of the already existing PBS_SERVER field, does it has to appear before PBS_SERVER_INSTANCES?

After modifying PBS_SERVER_INSTANCES, what are the steps to be followed for this change to be reflected in the whole cluster? Can an instance be removed from this list?

Does the order in which server instances are listed matters?

Do all clients and moms have to follow the same PBS_SERVER_INSTANCES at a given point in time?

pbs_shard_init() takes struct for server_instance whereas pbs_shard_get_server_byindex expects int for inactive_server_indexes. Can we maintain consistency?

Thanks for your questions @nithinj

do we have to give PBS_SERVER parameter explicitly?

As we know, pbs_mom can’t run alone without knowing the location of pbs_server. So YES, it is a must thing to specify the PBS_SERVER parameter in the pbs.conf file. It is not only for the Mom case, even if the scheduler runs in the standalone host, the PBS_SERVER parameter should exist in the conf file.

After modifying PBS_SERVER_INSTANCES, what are the steps to be followed for this change to be reflected in the whole cluster? Can an instance be removed from this list?

In the first implementation of multi-server, if the admin updates this parameter, it is outlined that admin should restart all the PBS moms or need to HUP them to re-read the changes in configuration.

Does the order in which server instances are listed matters?

Yes, the order matters. In ideal HPC clusters, conf files are managed by distributing the same file across all machines using the distribution (dsh) tools. It would not be complex to retain the same order.

Do all clients and moms have to follow the same PBS_SERVER_INSTANCES at a given point in time?

As mentioned earlier, the configuration file should be consistent across all machines in the PBS Pro cluster.

pbs_shard_init () takes struct for server_instance whereas pbs_shard_get_server_byindex expects int for inactive_server_indexes . Can we maintain consistency?

In the sharding policy definition, we have endorsed that sharding is a joint understanding between the application and the library. Since we already pass on the array of struct to pbs_shard_init(), the library just needs to know the inactive servers, passing index should be convenient in terms of computation. There is no benefit of passing in the struct form again.

PBS_MAX_SERVERS : Specify the maximum number of servers that this cluster will be allowed to start without bringing down everything in the cluster. However, while adding any new servers, all the moms are needed to hup for re-reading the updated pbs.conf file.

PBS_SERVER_INSTANCES: [[host]:port,][[host]:port,]…

Are we planning to run multiple servers in the same machine only in phase-1?
If not where is the database located and how Servers connect to it? As we know we have pbs conf variables like PBS_DATA_SERVICE_HOST, PBS_DATA_SERVICE_PORT to connect to external database host. If we are planning to use them it is good to mention all these details also in the document.

1 Like

Thanks @suresht for your feedback,

We are planning to run multi-servers on remote machine also along with data service host. I have updated the document with data service related switches.

I see from the doc that you can use host:port for PBS_SERVER_INSTANCES. Do they still have to specify PBS_SERVER? Isn’t it redundant? Also thanks for adding more details on PBS_SERVER in the doc.

Can you mention it under the PBS_SERVER_INSTANCES? Only moms are mentioned in the document? How about scheduler and other servers?

Can an instance be removed from PBS_SERVER_INSTANCES list?

Thanks for confirming and updating the docs.

I hope you are talking about the two multi-server switches mentioned in the document. Can you specify it in the document.

The library is not returning anything to the caller as part of init(). If you are assuming that the caller maintains the same structure as the library, why would the init() is expecting the strucure when int is efficient?

Can you mention it under the PBS_SERVER_INSTANCES? Only moms are mentioned in the document? How about scheduler and other servers?

I have changed moms to all the services.

In doc, We are not mandating PBS_SERVER parameter, we are just saying that admin can ignore only if PBS_SERVER exists prior to PBS_SERVER_INSTANCES. It’s an optional thing.

The reason is very simple, we are passing additional information to the library so that it will be feasible to implement another version of the library that sets the server id based on host:port combination.
And, even in the case of “Service locators” like etcd, zookeper and, consul…, it would be needed to know the list of configured servers for validation purposes.

Here I am not assuming. The design is caller should maintain the array of struct and pass on to init().

Thanks for updating the document.

Also repeating a question which you have missed in the last attempts:

I see. So the int array are the indexes of struct which the caller has already passed for init(). It will be more clear if you can specify it in the document.

As PBS_SERVER_INSTANCES takes more than one server as input, do we still acknowledge PBS_PRIMARY and PBS_SECONDARY variables which were used for failover?

That question has already been explained in the doc. I will re-iterate to you here. "However, while adding/removing any servers, all the services are needed to hup for re-reading the updated pbs.conf file. "

PLEASE refer these lines in the document,
The logic will return the next active server instance by referring to the array of configured server instances and an array of inactive server instances. However, the caller should maintain the same series of server instances received from the last call.

On success, returns one of the index value of server_instance array; on error, it returns -1

Here, this design page is cautiously designed to talk about the sharding algorithm and switches related to sharding, Just implementation of this module alone will not be sufficient to achieve multi-server architecture. Ideally, you should expect the topics other than the “Sharding algorithm” under this tree https://pbspro.atlassian.net/wiki/spaces/PD/pages/1507164161/PBS+Pro+multi-server+project+architecture+and+design.

Said that the failover related work is not handled as part of this work. I abstain from adding any additional information.