PTL performance test automation

Hi,

I have posted a document for the design of automating performance tests in PTL , So that the output can be saved and stored in db which can be useful for fetching comparison reports between builds and is useful to benchmark PBS . Please review and comment.

https://pbspro.atlassian.net/wiki/spaces/PD/pages/579338259/Performance+Test+Automation

Thanks,
Vishesh

1> Can you please provide more details on which database(is it Postgresql database etc) we are going to use and where it resides ? How PTL is going to access such a database ?
2> The format of the JSON file should be standard so that it can be used for all other test suites not just for the performance tests alone. Thinking of the following format at the moment. Below snippet does not contain all the required fields. We can
discuss more on this and then finalise the format after which we can seek for community to help on finalising this.

% cat test_results_<data_time>.json

{
“tests”: {
“PerfTestSuite”: {
“testJobPerf_<instance_number>”: {
“expected”: “PASS”,
“actual”: “PASS”,
“test_start_time”: <start_time in epoch>
“test_end_time”: <end_time in epoch>
}
}
}
},
“interrupted”: false,
“path_delimiter”: “.”,
“pbs_version”: “18.2”,
“user_id” : “pbsroot”
“seconds_since_epoch”:
“num_failures_by_type”: {
“FAIL”: 0,
“PASS”: 1
},
}

3> It would be better to rename perf_test_result to capture_test_results_json. Also this API should pass the result in a standard dictionary something like above. This way it is same and standard across all test cases.

4> I don’t think we no need to use separate API like perf_trial_result() for running the same test case multiple times. If we go with the above standard dictionary that can solve our problem.

5>The name of the JSON file should contain the date and time in regular format. Better not to put this in epoch.

6> We need to eliminate trial_run, trial_run_result tables. Also we need to club test_run and test_run_result into a same table. It is good to have just couple of tables to capture all these details. More redundant data and tables causes higher normalisation which results in having lot of join queries which might degrade the performance.

@vishesh, I have provided my initial comments. Please go through it.

@visheshh,
I am working on a design proposal for JSON format support of PTL report at the framework level; which I plan to post in a couple of days. Hence, I feel this change which is specific to performance testing is completely not needed.

Thanks @saritakh ,
After your design proposal , We can discuss on a common JSON format for the report and work towards a common goal.

Hi @suresht ,
As you might know first we shall discuss with sarita on the generic JSON report for all the tests.

I will clarify on why we are using trial_run and trial_run_result .
For example:
If we consider a test for job submission time required for 100 jobs , We wont just submit 100 jobs to calculate time we will submit it 10 times and calculate an average .
It might be useful to know how much time it took each time . In that case we can use this .
For some other test case , you might want to run the test case in a loop for 5 times and check if the value we are getting is consistent then we can use trial_run data to store it .

It might not be needed for reporting but i feel it might be helpful for testing purposes .

Thanks,
Vishesh

@visheshh,
Thanks for giving the example for trial_run. What I was initially thinking is that why can’t we store the trial run data in the test_run and test_run_result itself. To me if we run a test case say performance test case in this case consequently every day till 10 days OR if we run the same test case 10 times on a single day does not make much difference, both can be captured in the same table.

Regarding the other things as you said we need to discuss this with Sarita as they are about to post an EDD regarding storing test results in uniform way. We need to see whether all of our requirements match with them and then work towards the common goal. Let us keep this thread open so that they can all refer to it and incorporate the requirements that we need as well if appropriate to them.

@suresht @saritakh,
I have made changes to EDD specifying that the API now will only write to the existing JSON file .
Please review .

Thanks @visheshh.

I would reword following
"Test writer should be able to pass the performance result of whatever test writer is measuring to a PTL API and it should store all the results in a json file , this is limited to performance folder "

to something like
"Test writer should pass the performance measurements to a PTL API which should store all the results in a json file. This will be applicable to all tests added under performance folder. "

please use the standard format for EDD and define “perf_test_result” API in that format explaining what attributes it takes etc …

Thanks @visheshh for making the changes. Most of the changes look good to me. I have couple of minor comments given below.

It looks like the API perf_test_result() appends the data that is passed through its parameters to the file ptl_test_results.json if PTL test suite is run with the option --db-type= “json”. An example here helps to tell us how this information looks like (a dictionary or topple or list) in the final dictionary.

Also please mention the return value of the ptl_test_results.json API and any exceptions that it throws incase of any error.

Thanks,
Suresh

@visheshh, I have one more comment as we discussed. It is good to store the configuration information of a performance test scenario at a test case level instead of test suite level. Through this information clearly we can tell on which environment the test case is run, its pre-configuration etc. This info then can be used to draw some important conclusions(out of these performance metrics) and can also be used for cross comparing with previous data.

One thought is we can pass this information as a dictionary to the test_perf_result() API which can append this data to the ptl_tests_results.json file.

Hi @suresht, @saritakh, @anamika, Please review the updated Design

Thanks @visheshh for addressing the comments. Please add link to forum discussion on your design page.

  • add synopsis or details of the API perf_test_result(). (This API will take the measurements and store them in a json file by calling existing PTL API “set_test_measurements()” …)
  • Also mention what would be the return value in case of success and what exceptions would be raised in case of error.
  • Once you add the synopsis you can remove the unwanted bullets.
  • Replace “trail” with “trial” every where
  • following line looks redundant. Also not exactly an example
    Example: perf_test_result(test_measure=“Performance measurement name”, value= <result>, unit= “unit”)
  • at below example I was expecting “trial_no” to be the first line followed by test_measure. do we have control on how they get stored in JSON or not?

{
“value”: 235.00,
“trail_no”: 1,
“test_measure”: “job_submission”,
“unit”: “jobs/sec”,
“measurement”: “mean”
},

  • please clearly explain that for average mean, median, std_deviation, et … trial_no will be reported as “avg”.
    {
    “value”: 213.34451767455403,
    "trail_no": “avg”,
    “test_measure”: “job_submission”,
    “unit”: “jobs/sec”,
    “measurement”: “mean”
    },

  • Also if I understand, perf_test_result API will log results in following format

    • measurements for all individual trial runs, trial_no, test_measure, value, unit, measurement
    • Calculate the mean, median, standard deviation for all trail runs and store those measurements as well

if my understanding is correct then make it clear under synopsis of the API. Also while generating report would be only display the average values or individual trial runs as well or it is upto the user.

@visheshh Please find my comments below, I see the first one is already updated:

  1. I think it would be better to mention the API as TestPerformance.perf_test_result() which makes it clear what it is part of.
  2. Mention the type and default value of each parameter passed to the function
  3. Make sure the indent is correct in the given sample json data
  4. Clearly bifurcate the code and the sample json data; right now it looks all mixed up.
  5. suggest to replace “folder” with “directory”
  6. In which case dose trial_no’s value will be written as “avg”?
  7. I suggest to make these points of the interface to be clear:
  • it is a wrapper above PBSTestSuite.set_test_measurements with the intent to update multiple key-value pairs of test’s measurements with appropriate identification in the json report
  • behavior when single value is passed
  • behavior when list of values are passed
  • point about trial_no’s value being set as “avg” etc.
  1. In the example I see that both the call’s “test_measure” value is “job_submission” where as the value variable is “sub_rate” in one case and “run_rate” in other case. Should the test_measure value to different in the second call?
  2. Separate out the point about test configuration into a different note, since it does not come under this API.

@saritakh, @anamika ,
I have addressed your comments . Please review .

Thanks @visheshh
The design looks good to me.

Thanks @visheshh. This looks much clean now.