Update file format and file name for data saved using save_configuration() method in PTL

RKORavi · March 20, 2019, 6:25am

Hi all,

I have created an EDD for interface change to save_configuration() & load_configuration() in PTL.
EDD link: https://pbspro.atlassian.net/wiki/spaces/PD/pages/1156284419/Update+file+format+and+file+name+for+data+saved+using+save+configuration+method+in+PTL

Please review & share your comments.

Thanks & Regards,
Ravi Kant

kjakkali · March 20, 2019, 2:00pm

@RKORavi : Overall design looks good.

I have few questions.

How are you handling multinode cluster pbs configuration?
Can you add an example of json file?

RKORavi · March 21, 2019, 7:19am

Hi Kumar,

I have updated the EDD to address above two questions. Please review.

Regards,
Ravi

vccardenas · March 22, 2019, 6:33pm

@ RKORavi, a correction:
Interface: load_configuration()
Synopsis: Apply configuration saved by load_configuration --> should be save_configuration

Also the statement: “A Single file to be used to save all the configurations” seems to apply per host.
Will this file include all the server, scheduler and mom configurations in case all PBS daemons were on one host?

After loading the saved configuration do we need to HUP or restart any/all of the daemons?

RKORavi · March 25, 2019, 3:24am

@vccardenas thanks for review. I have updated the EDD, please check.
When all PBS daemons are running on a single host, all configuration will be saved to a single file. Idea is to keep different file for each host.
Yes, we have restart or HUP daemons based on configuration that is loaded. I have added this in EDD.

Regards,
Ravi

hirenvadalia · March 27, 2019, 3:49am

@RKORavi Here are my comments on this design doc:

Add details about what will be saved by save_configuration() (You can refer here)
1st point (and 3rd point) says single file per host: why per host file? can’t we include all data from all host into a single file (even if it is a multinode cluster)? (Means every call to save_configuration will generate only one new file)
2nd point says, file contents will be saved in base64 encoded but the example shows qmgr output, also if I remember correctly then when we discussed offline this we discussed only encoding all PBS configuration and hooks related files not qmgr or pbsnodes output… Also if you encode qmgr output and while loading that how will you compare that? For example, let’s say after you saved qmgr -c “print server” output user has added in a resource, now when you compare base64 of this you will find out that content is different but how will you find out what to delete? because saved configuration doesn’t have any logic to delete a newly added resource
About 5th point: Do we really need to say that? I mean we know that after changing sched config we need to HUP sched to apply that config right? so why we need to explicitly say that? I think we should remove 5th point so that it won’t create any confusion
What will happen if tempdir already has ‘pbsptl_host1_1.json’ when the user invokes pbs_benchpress?
I think we should clean up tempdir for these JSON files when the user invokes pbs_benchpress

RKORavi · March 27, 2019, 5:49am

Hi Hiren,

Thankyou for reviewing the EDD.

I have updated EDD on contents to be saved by save_configuration.
If we maintain single file per host then it would be easier to send mom_config & sched_config file(in multi sched setup) to different host. Are you suggesting that we keep single file to save all the configuration for multinode setup, but then we have to parse file contents to be send to different host.
I did not think of removing any custom resource. You are right we should encode only file contents and not print server. I have updated EDD regarding this.
I have removed fifth point.
Yes, tempdir has to be cleaned up in case a file with same name is available, added this as point 3 in EDD.

Regards,
Ravi

hirenvadalia · March 27, 2019, 6:58am

Yes, it will be good to keep single file for all hosts even if it multinode setup

neha.padole · March 27, 2019, 8:43am

In the section “Different aspects to be covered by this EDD”, in #1 ‘pbsptl_conf’ is specified as the prefix of the saved json file. In #5, the example is “pbsptl_host1_1.json”.

agrawalravi90 · March 27, 2019, 8:52am

Thanks for working on this Ravi. PFB some comments on the design:

save_configuration() will save the following from the PBS cluster:
Commands output:
qmgr print server
qmgr print sched
qmgr print hook
qmgr export hook <each hook> application/x-python default
qmgr export hook <each hook> application/x-config default
qmgr print resource
qmgr list pbshook
qmgr export pbshook <each pbshook> application/x-config default
pbsnodes
Configuration files:
sched_priv/sched_config
sched_priv/holidays
sched_priv/resource_group
sched_priv/dedicated_time
multiple schedulers directories if any
mom_priv/config (files of each hosts listed in pbsnodes)
mom_priv/config.d/ (saves vnode definition files listed with “pbs_mom -s list”)
/etc/pbs.conf (all nodes)

Was save_configuration already capturing all of this before? Is yes then please mention that save_configuration will continue capturing all of this. If you are changing it to capture something new then please highlight that in the doc.
Also, it will be very helpful to know what information will be captured for each daemon. So, can you please break this down into what info will be captured for each daemon?

A Single file per host to be used to save all the configurations,

So, for a multi-node configuration, we will capture multiple files, where will those files go? is there going to be a parent directory inside which they will be stored? Also, on which host will they be accumulated?

File name with the prefix as “pbsptl_conf” and suffix as “. json”.

Where does the hostname go in the filename? I think in later examples you mention pbsptl_<hostname>_conf.json, so please correct it here.

File contents can be encoded in base64 & that base64 is added to json file.

So will there be an argument to save_configuration() which decides whether to encrypt the data or not?

Different file to be used for each host, to handle multinode cluster pbs configuration. Each file to have a prefix of hostname to identify between configuration of different hosts.
e.g, pbsptl_host1_versionNumber.json, pbsptl_host2_versionNumber,json

You already talk about file name in bullet 1, so I suggest that you merge the two

Different versions of files to be maintained, every time save_configuration() is called.

then why do you need to do this:

When user invokes pbs_benchpress, cleanup of tempdir is performed in case file with same name is present.

RKORavi · March 28, 2019, 4:11am

Hi Ravi,
Thank you for reviewing the EDD.
Changes i have made to EDD after your comment:

Save configuration will continue to save contents that it was doing it before. I am not adding anything new to save configuration.
Divided save_configuration contents as per each daemons.
Single file to be used to save all the hosts configurations. Updated json file example accoding to this.
Merged point 1 & point 4.

Any config file that is to be saved is encoded in base64 format. So, I am not adding an argument to save_configuration.

When user invokes pbs_benchpress, cleanup of tempdir is performed in case file with same name is present
– Even though we maintain different version of file for each iteration of save_configuration, it is just a precautionary measure to check if there is any file name with same name already present in temp directory.

Regards,
Ravi

agrawalravi90 · March 28, 2019, 7:10am

I think you should. It can be useful for somebody to just look at the data dump itself to infer some information instead of going through the effort of loading it via pbs_config. Your second bullet also seems to suggest that the encoding is optional: " 2. File format: File contents can be encoded in base64 & that base64 is added to json file."

I’m not sure I understand why though. What’s the harm if you don’t remove the already existing file?

RKORavi · March 29, 2019, 2:36am

Hi Ravi,

If any config file is not encoded to base64 format then user has to take care of line like:
resources: “ncpus, mem, arch, host, vnode, aoe, eoe” because it can’t be stored as it is in json file. He might have to parse config file contents to take care of adding “” in these lines. But when we do not give an option whether to encode config file contents or not then all user has to do is decode the contents to look at data dump. your thoughts?

Regards,
Ravi

RKORavi · March 29, 2019, 3:12am

Hi all,

I would like to have some suggestions on how to store hook files, I am thinking of storing all hooks .PY & .CF file in a tar ball or should i store it in same json file. But storing hook files content in same json file will increase its size drastically.
Any suggestion on this will be helpful, before i update EDD.

Regards,
Ravi

hirenvadalia · March 29, 2019, 5:20am

@RKORavi I will vote for storing any information not matter what it is in json file only so that we have to take care of one file only… Keeping multiple file + different format will cause maintenance + complexity.

agrawalravi90 · March 29, 2019, 5:32am

Ok, if it’s complicated to generate json for non-encrypted data then I’m ok with not providing an option to dump non-encrypted data, if a use case for it emerges later then we can add it then.

agrawalravi90 · March 29, 2019, 5:33am

Well you either increase the size of one file or the whole package that includes multiple files, it doesn’t make much difference. If you are worried about the dump getting too big then just compress the one file.

RKORavi · March 29, 2019, 5:49am

Thank you all for review and suggestions. I will go ahead with maintaining a single json file. If there are no further comments then please go ahead and sign-off EDD so that i could start making code changes.

Regards,
Ravi

agrawalravi90 · March 29, 2019, 5:56am

Please remove the following bullet, it suggests that encrypting in base64 is optional:
“2. File format: File contents can be encoded in base64 & that base64 is added to json file.”

Also, can you add the bit that we just discussed about hooks?

hirenvadalia · March 29, 2019, 6:00am

@RKORavi Design doc looks good to me. Thanks for making changes.

Topic		Replies	Views
Design for a supported way to change default setup in PTL Developers	49	2165	September 21, 2018
PP-685: Provide a coherent interface for managing configuration and other data Developers	1	1234	April 7, 2017
PP-302: Implement save of PBS data for post-run analysis Developers	53	4782	March 16, 2017
Design for importing hook config file in PTL Developers	44	1947	August 2, 2019
New configuration variable: PBS_MOM_NODE_NAME Developers	25	5365	August 4, 2016

Update file format and file name for data saved using save_configuration() method in PTL

Related topics