Proposal to make PBS server set necessary attributes to their default value when these attributes are unset

PTL framework has this inconsistency on how it reverts the server and scheduler attributes.
In case of server it has a dictionary of default attributes and their respective values. While reverting server, PTL makes sure that these attributes get their default values.
On the other hand, while reverting scheduler object, PTL does not have any dictionary of default attributes, It reverts all the relevant attributes and PBS scheduler then sets the default values to them (if needed) when they are unset (like in case of sched_cycle_length, sched_priv etc.).

It seems to me, by not having a dictionary of default sever attributes in PTL we would be better off because then PBS server to function properly will have the necessary attributes set to their default values. If we do this then we will also have to change pbs_habitat because habitat sets server attributes to some default values. Additionally, it will require code changes in PBS server as well to set default values to necessary attributes when they are unset.
At present, pbs_habitat apart from creating a default queue, sets the following server attributes:

set server default_queue = workq
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server scheduler_iteration = 600
set server resources_default.ncpus = 1
set server scheduling = True

Now, I want to know if there is anyone who thinks that removing this list from habitat and PTL and making PBS server set them internally is a bad idea. Having a discussion would help in decision making.

Link to document proposing the external changes.

Thanks,
Arun

Thanks for proposing this change Arun. I like the idea of moving the setting of default attributes to Server, and I definitely vote for PTL to leave the setting of defaults to PBS.

Setting up of default server attribute happens only at the installation time. If you move it to server, it will get set during every server initialization/server start.
For instance to disable scheduling you can set the attribute to zero or you can unset the attribute. The later won’t be true after this change as scheduling is set back to true with the next server restart.
I don’t know whether this change will impact anyone. But this thread probably will get lesser attention as the title does not mention about this PBSPro behavior change which should have invited more stakeholders for this discussion.

Thanks @nithinj and @agrawalravi90 for your comments.

@nithinj I think I wasn’t clear in writing it out, PBS server should only set these necessary attributes when the server comes up for the first time or when admin explicitly unset the attribute. In all other server restart cases it should just load the previously set value from the database.

For example if as a manager you try to set scheduler attribute ‘sched_cycle_length’ then it will allow you to do so, but if you try to unset that attribute it will get a default value of 20 minutes.

Your comment about disabling scheduling attribute is correct, but it is based on the assumption that when admin unsets the scheduling attribute it will get set to True (Although I do think that it is the correct thing to do :slight_smile: ). If we all agree that PBS server should be the one that is allowed to set attributes to default values then we can have further discussion on what default value should be for each of these attributes.

I’ve changed the title of the discussion in hope to gain more traction.

totally makes sense to me

Thanks for the explanation. makes sense.

It is more complicated than this, I am afraid, due to the way things are (and presumably always have been)… In some cases how the server acts when a value is unset is different than what it is set to by pbs_habitat. Take query_other_jobs, for example. The pbs_habitat script sets it to true, BUT if it gets unset then the server treats is as false. It is not set to true again upon server restart, it is only set to true in pbs_habitat.

if I unset query_other_jobs it immediately gets set to false, right? Then the out of the box installation would have this set to false rather than true (which I think is the proper “default” as in “out of the box” value)? Or would we change “unset” to behave as “true”?

This is an interesting problem. It will be difficult to find the right direction for such cases unless we know why is that server’s default and what habitat sets are two different values.
In case of query_other_jobs, If habitat sets it to true after installation then why is server having a default unset value as false? Attribute ‘Scheduling’ also has a similar behavior. Habitat sets it to True and it’s unset behavior is false. But Ideally for PBS to function properly we do need scheduling to be True, If admin really needs it to be false then instead of unsetting the attribute they can set it to false.

I think in most cases server should just follow the values habitat is setting. There are also cases where maybe it is not the right thing for server to have defaults, like default_queue. I think such settings will continue to persist in habitat file.

Different things can happen when attributes are unset. The default action is to unset it. It disappears from qstat/pbsnodes and has some behavior. For booleans that behavior might be different than True or False. The other thing that can happen is we reset it back to another value (most likely its default value). I think @arungrover is going to do the latter. If you unset scheduling, it will be set to True. It won’t be unset with the behavior of True.

@arungrover has a good point, if pbs_habitat is setting defaults, why isn’t the pbs_server setting those defaults. By having pbs_habitat do it, we then have 2 sets of “defaults”. The defaults you get when you freshly start PBS without running pbs_habitat (i.e. pbs_server -t create), and the other set when you do run pbs_habitat. Since RPM/init script runs pbs_habitat, why don’t we just have the pbs_server set those defaults and have only one set of defaults.

Bhroam

Without mapping out all the things pbs_habitat sets it is difficult to comment, but un-setting something could need different behaviour to what we set using pbs_habitat. It may be that unsetting something would also have the expectation that the option is off, but pbs_habitat sets a system that will schedule workload in a basic fashion as a starter. The purpose of pbs_habitat and default values is different isn’t it?

I agree. unset is disguised as reset in this thread discussion. But the literal meaning especially in case of Boolean resources is to turn it off, not resetting back to default.
If we go by this route, setting it with the habitat vs setting in the server during the instantiation doesn’t have much difference as both does the same stuff.
When we unset non-boolean resources, how about treating with special flags so that it wont get displayed with stat commands even though internally it holds default value?

IMO, instead of turning things off when the attribute is unset, we should set them to a value that we see fit for PBS to work properly (like unsetting scheduling should set it to True) because if admins want them off they can always set the attribute off.

Should it be different? I think default values should just make PBS schedule workload in a basic fashion. Then we can remove the things that pbs_habitat sets during installation and have server deal with it.

We can do what you are saying but unless there is a use case where admins would not like to see default values when the unset attributes, I think it is ok to show them what PBS is internally using when they unset an attribute. We already do that for scheduler attributes like sched_cycle_length.

I tested the attributes pbs_habitat sets and how they are considered internally when admin unsets them as of today, maybe this will get us more clarity on what can be the default value for each of these attributes -

Thanks @arungrover. The table is helpful. I suggest please create a design document showcasing the behavior today and the proposed change behavior for more clarity. For example, there will not be any default queue, change of behavior when server is start at first time vs restarted later, etc …

Hey guys, I have created a confluence page about what is being set on the server today in a plain vanilla installation and what server will set it to in future with this change. The document also proposes the default values of all necessary attributes if they are “unset” by admins.
Please look at this page - https://pbspro.atlassian.net/wiki/spaces/PD/pages/924221441/Changes+to+how+default+attributes+are+set+on+PBS+server

If I follow the table correctly, the default value of scheduling set during a vanilla installation by the server is true and for unset it is false (which is the present behavior). Why resv_enable, query_other_jobs are set to true on unset which would have been false if you follow the same logic as that of scheduling. Am I missing something?

Thank you very much @arungrover for creating the page. This is much clear now. I also think scheduling is True by default.
Also I share the same concern as Nithin for resv_enable and query_other_jobs. Speaking of that can you also add one more column for what happens today when you unset a server attribute.

Thanks for comments @nithinj and @anamika
In the case of resv_enable, as of today, even when the attribute is unset reservations are allowed. So unset behavior is the same as ‘True’ which is what the document also says.
The reason I kept query_other_jobs unset behavior as true is that by default ‘habitat’ sets it to true
‘Scheduling’ is a Typo, IMO its unset behavior should be ‘True’, this will surely break backward compatibility, but I think it would be the right thing to do because we do want scheduling to be ‘True’ for the normal mode of operation.
I’ll update the page with what is the behavior when we unset attributes today.

It might be less confusing if there wasn’t a keyword ‘Unset’ and instead it was called ‘Reset’ which would just reset the values to their default.

I’ve changes the document based on comments, please have a look again

Thanks @arungrover. I agree a reset would have been less confusing.
I am okay with the idea of having all of the attributes consistent (except mom_fail_requeue) except it will be more work.
Just concluding that behavior for following will be changed: (can you also update the document to reflect they are proposed to change)

scheduling
default_chunk.ncpus
query_other_jobs
log_events
scheduler_iteration
resources_default.ncpus
mail_from

I am not clear what would be the proposed behavior of default_queue. are you saying that it will be set to a value rather than not being set at all? if yes then how do we unset default_queue? same question applies to other attributes with string values like mail_from.

Thanks for your comments @anamika
I’ll change the document to reflect that these attributes will change their behavior to being set by server by default.

Attributes like “default_queue” can not be set by server itself unless server creates a default queue. As of today, default queue creation (workq) is done in pbs_habitat, it is pbs_habitat that should set the default_queue as well. I can move the default queue creation and setting the default queue in server. Although, I’m not sure if that is necessary. Do you think it would be better to move queue creation and setting default_queue to server?