Thanks for your questions @nithinj
do we have to give PBS_SERVER parameter explicitly?
As we know, pbs_mom can’t run alone without knowing the location of pbs_server. So YES, it is a must thing to specify the PBS_SERVER parameter in the pbs.conf file. It is not only for the Mom case, even if the scheduler runs in the standalone host, the PBS_SERVER parameter should exist in the conf file.
After modifying PBS_SERVER_INSTANCES, what are the steps to be followed for this change to be reflected in the whole cluster? Can an instance be removed from this list?
In the first implementation of multi-server, if the admin updates this parameter, it is outlined that admin should restart all the PBS moms or need to HUP them to re-read the changes in configuration.
Does the order in which server instances are listed matters?
Yes, the order matters. In ideal HPC clusters, conf files are managed by distributing the same file across all machines using the distribution (dsh) tools. It would not be complex to retain the same order.
Do all clients and moms have to follow the same PBS_SERVER_INSTANCES at a given point in time?
As mentioned earlier, the configuration file should be consistent across all machines in the PBS Pro cluster.
pbs_shard_init () takes struct for server_instance whereas pbs_shard_get_server_byindex expects int for inactive_server_indexes . Can we maintain consistency?
In the sharding policy definition, we have endorsed that sharding is a joint understanding between the application and the library. Since we already pass on the array of struct to pbs_shard_init(), the library just needs to know the inactive servers, passing index should be convenient in terms of computation. There is no benefit of passing in the struct form again.