Update file format and file name for data saved using save_configuration() method in PTL

Hi all,

I have created an EDD for interface change to save_configuration() & load_configuration() in PTL.
EDD link: https://pbspro.atlassian.net/wiki/spaces/PD/pages/1156284419/Update+file+format+and+file+name+for+data+saved+using+save+configuration+method+in+PTL

Please review & share your comments.

Thanks & Regards,
Ravi Kant

1 Like

@RKORavi : Overall design looks good.

I have few questions.

  • How are you handling multinode cluster pbs configuration?
  • Can you add an example of json file?

Hi Kumar,

I have updated the EDD to address above two questions. Please review.

Regards,
Ravi

@ RKORavi, a correction:
Interface: load_configuration()
Synopsis: Apply configuration saved by load_configuration --> should be save_configuration

Also the statement: ā€œA Single file to be used to save all the configurationsā€ seems to apply per host.
Will this file include all the server, scheduler and mom configurations in case all PBS daemons were on one host?

After loading the saved configuration do we need to HUP or restart any/all of the daemons?

@vccardenas thanks for review. I have updated the EDD, please check.
When all PBS daemons are running on a single host, all configuration will be saved to a single file. Idea is to keep different file for each host.
Yes, we have restart or HUP daemons based on configuration that is loaded. I have added this in EDD.

Regards,
Ravi

@RKORavi Here are my comments on this design doc:

  • Add details about what will be saved by save_configuration() (You can refer here)
  • 1st point (and 3rd point) says single file per host: why per host file? canā€™t we include all data from all host into a single file (even if it is a multinode cluster)? (Means every call to save_configuration will generate only one new file)
  • 2nd point says, file contents will be saved in base64 encoded but the example shows qmgr output, also if I remember correctly then when we discussed offline this we discussed only encoding all PBS configuration and hooks related files not qmgr or pbsnodes outputā€¦ Also if you encode qmgr output and while loading that how will you compare that? For example, letā€™s say after you saved qmgr -c ā€œprint serverā€ output user has added in a resource, now when you compare base64 of this you will find out that content is different but how will you find out what to delete? because saved configuration doesnā€™t have any logic to delete a newly added resource
  • About 5th point: Do we really need to say that? I mean we know that after changing sched config we need to HUP sched to apply that config right? so why we need to explicitly say that? I think we should remove 5th point so that it wonā€™t create any confusion
  • What will happen if tempdir already has ā€˜pbsptl_host1_1.jsonā€™ when the user invokes pbs_benchpress?
  • I think we should clean up tempdir for these JSON files when the user invokes pbs_benchpress

Hi Hiren,

Thankyou for reviewing the EDD.

  • I have updated EDD on contents to be saved by save_configuration.
  • If we maintain single file per host then it would be easier to send mom_config & sched_config file(in multi sched setup) to different host. Are you suggesting that we keep single file to save all the configuration for multinode setup, but then we have to parse file contents to be send to different host.
  • I did not think of removing any custom resource. You are right we should encode only file contents and not print server. I have updated EDD regarding this.
  • I have removed fifth point.
  • Yes, tempdir has to be cleaned up in case a file with same name is available, added this as point 3 in EDD.

Regards,
Ravi

Yes, it will be good to keep single file for all hosts even if it multinode setup

In the section ā€œDifferent aspects to be covered by this EDDā€, in #1 ā€˜pbsptl_confā€™ is specified as the prefix of the saved json file. In #5, the example is ā€œpbsptl_host1_1.jsonā€.

Thanks for working on this Ravi. PFB some comments on the design:

save_configuration() will save the following from the PBS cluster:
Commands output:
qmgr print server
qmgr print sched
qmgr print hook
qmgr export hook <each hook> application/x-python default
qmgr export hook <each hook> application/x-config default
qmgr print resource
qmgr list pbshook
qmgr export pbshook <each pbshook> application/x-config default
pbsnodes
Configuration files:
sched_priv/sched_config
sched_priv/holidays
sched_priv/resource_group
sched_priv/dedicated_time
multiple schedulers directories if any
mom_priv/config (files of each hosts listed in pbsnodes)
mom_priv/config.d/ (saves vnode definition files listed with ā€œpbs_mom -s listā€)
/etc/pbs.conf (all nodes)

Was save_configuration already capturing all of this before? Is yes then please mention that save_configuration will continue capturing all of this. If you are changing it to capture something new then please highlight that in the doc.
Also, it will be very helpful to know what information will be captured for each daemon. So, can you please break this down into what info will be captured for each daemon?

A Single file per host to be used to save all the configurations,

So, for a multi-node configuration, we will capture multiple files, where will those files go? is there going to be a parent directory inside which they will be stored? Also, on which host will they be accumulated?

File name with the prefix as ā€œpbsptl_confā€ and suffix as ā€œ. jsonā€.

Where does the hostname go in the filename? I think in later examples you mention pbsptl_<hostname>_conf.json, so please correct it here.

File contents can be encoded in base64 & that base64 is added to json file.

So will there be an argument to save_configuration() which decides whether to encrypt the data or not?

Different file to be used for each host, to handle multinode cluster pbs configuration. Each file to have a prefix of hostname to identify between configuration of different hosts.
e.g, pbsptl_host1_versionNumber.json, pbsptl_host2_versionNumber,json

You already talk about file name in bullet 1, so I suggest that you merge the two

Different versions of files to be maintained, every time save_configuration() is called.

then why do you need to do this:

When user invokes pbs_benchpress, cleanup of tempdir is performed in case file with same name is present.

Hi Ravi,
Thank you for reviewing the EDD.
Changes i have made to EDD after your comment:

  1. Save configuration will continue to save contents that it was doing it before. I am not adding anything new to save configuration.
  2. Divided save_configuration contents as per each daemons.
  3. Single file to be used to save all the hosts configurations. Updated json file example accoding to this.
  4. Merged point 1 & point 4.
  • Any config file that is to be saved is encoded in base64 format. So, I am not adding an argument to save_configuration.

    When user invokes pbs_benchpress, cleanup of tempdir is performed in case file with same name is present
    ā€“ Even though we maintain different version of file for each iteration of save_configuration, it is just a precautionary measure to check if there is any file name with same name already present in temp directory.

Regards,
Ravi

I think you should. It can be useful for somebody to just look at the data dump itself to infer some information instead of going through the effort of loading it via pbs_config. Your second bullet also seems to suggest that the encoding is optional: " 2. File format: File contents can be encoded in base64 & that base64 is added to json file."

Iā€™m not sure I understand why though. Whatā€™s the harm if you donā€™t remove the already existing file?

Hi Ravi,

If any config file is not encoded to base64 format then user has to take care of line like:
resources: ā€œncpus, mem, arch, host, vnode, aoe, eoeā€ because it canā€™t be stored as it is in json file. He might have to parse config file contents to take care of adding ā€œā€ in these lines. But when we do not give an option whether to encode config file contents or not then all user has to do is decode the contents to look at data dump. your thoughts?

Regards,
Ravi

Hi all,

I would like to have some suggestions on how to store hook files, I am thinking of storing all hooks .PY & .CF file in a tar ball or should i store it in same json file. But storing hook files content in same json file will increase its size drastically.
Any suggestion on this will be helpful, before i update EDD.

Regards,
Ravi

@RKORavi I will vote for storing any information not matter what it is in json file only so that we have to take care of one file onlyā€¦ Keeping multiple file + different format will cause maintenance + complexity.

Ok, if itā€™s complicated to generate json for non-encrypted data then Iā€™m ok with not providing an option to dump non-encrypted data, if a use case for it emerges later then we can add it then.

Well you either increase the size of one file or the whole package that includes multiple files, it doesnā€™t make much difference. If you are worried about the dump getting too big then just compress the one file.

Thank you all for review and suggestions. I will go ahead with maintaining a single json file. If there are no further comments then please go ahead and sign-off EDD so that i could start making code changes.

Regards,
Ravi

Please remove the following bullet, it suggests that encrypting in base64 is optional:
ā€œ2. File format: File contents can be encoded in base64 & that base64 is added to json file.ā€

Also, can you add the bit that we just discussed about hooks?

@RKORavi Design doc looks good to me. Thanks for making changes.