‘requirements()’ decorator in PTL is used to specify the PBS cluster configuration necessary to run a PTL test and during test run, this data is evaluated against the cluster data passed in custom parameters of pbs_benchpress. The current format of PBS cluster specification in the decorator gives only the PBS daemon counts a and certain set of flags. This causes ambiguity in actual number of nodes needed by the test to run and the node names. Hence, this design is to reform the cluster data specification in order to have clear information of the PBS cluster needed to run the test.
I just have some comments on the idea of requirements, and what I would like to see in some iteration of PTL.
A test’s requirements should be the exact setup a test needs to run. It could be 1 server 1 mom 1 sched, maybe 3 moms, 1 server 1 remote mom, etc.
We should be able to give benchpress a list of hosts where servers, moms, and scheds can reside, and have PTL for each test determine if a test can fit on the given setup, and if it can, provide the exact requirements for the test.
This solves the problem where trying to run smoke tests with 2 moms given to benchpress will fail most tests, as the tests require only one mom.
For example, I give benchpress 1 server, 3 moms, and 1 scheduler:
I run testA, which requires 1 server, 1 mom, 1 scheduler.
PTL should be able to recognize that testA needs 1server, 1 mom, 1 scheduler, and that the given cluster has enough to satisfy these requirements. It should then create a setup for the test that is 1 server, 1 mom, 1 scheduler.
I run testB, which requires 1 server, 3 moms, 1 scheduler/
PTL sees the requirements, the cluster can satisfy the requirements, and creates a setup exactly as the requirements asked.
I run testC, which requires 1 server, 5 moms, 1 scheduler
PTL sees that the cluster can’t satsify the requirements, (5 moms > 3 moms), and skips the test.
@saritakh can we have the same interface as pbs_benchpress for @requirements decorator as well? I feel that will simply the list.
instead of “host1=, host2=”
use “server=(host1),sched=(host1),mom(host1,host2),comm=(host1)”
where server and mom are mandatory and sched and comm can default to server host if not specified. Since our default config is no mom on server we need to make sure they are not provided with same host name except when using --run-on-single-node.
If we follow this, it would be also helpful to let the test runner know the minimum configuration required to run all the test cases attempted. For instance, the user is trying to run SmokeTest and what is the minimum no of each daemon required to run all the tests without skipping any of them. Is it part of interface4?
also to let the test runner know which all tests will be skipped for a given configuration without really executing all the tests.
I like the idea of running on a remote mom. I don’t think the --run_on_single_node is necessary though. To run on a remote mom, you have to provide it via -p moms. How about if -p moms isn’t given, run it on the local node like it does now? If -p moms=mom1 is given, run it on the remote mom.
Do we really need -p servers, moms, comms, scheds, nomom anymore with new syntax of requirement decorator? Why don’t we remove those and one simple argument called --hosts which is list of hosts to be used in tests and let PTL configure required PBS component (i.e Mom or comm or both etc…) on given host during setUpClass/setUp of test.
Syntax of --hosts can be host[:port][@<path to pbs conf>][,host[:port][@<path to pbs conf>]]...
Following up on @hirenvadalia comment,
I agree with this simple implementation. The PTL framework will be able to figure out which type of installtation is present on the host. And the end user simply needs to know what kind of node cluster to present as input.
In the current implementation of requirements decorator there are a lot of test cases which get skipped during execution. As an end user i want all my test cases to be run, so i propose we should have an option in pbs_benchpress (something similar to --gen-ts-tree) like “-get-required-nodes”. This option will return something of a “one size fits all” cluster of nodes which will be capable of running most or all of the test cases given as input.
For ex pbs_benchpress -t TestCgroupsHook,SmokeTest --get-required-nodes
will return something as "h1=server,h2=server,h3=mom,h4=client"
Now the user can simply setup the clusters in the same manner and pass the info as nodes to pbs_benchpress -t TestCgroupsHook,SmokeTest --hosts=h1,h2,h3,h4
In my opinion this eliminates the issue of skipping tests and is relatively simple.
In case the user does not pass the recommended cluster it would run all test cases possible and skip the others. We can always compare the current cluster with recommended cluster and make adjustments for the same.
@sandisamp I would name option as --get-required-hosts (or --get-required-machines) (As node is kind a PBS object - which indicates MoM)
(Although this might be another RFE/project but) I would extend capability --get-required-hosts to such a way that if -t or --tags is not given then it should print json dict with all tests and its required cluster hosts info, which can be very useful to some automated tool
Best example such automated tool is ci tool, currently its really hard to find required cluster hosts count and types of setup in particular host in cluster, with --hosts and --get-required-hosts its really simple
Yes @vstumpf, that is exactly the plan of this design. Tests requiring equal or lesser hosts than the number and type of hosts mentioned in pbs_benchpress will run; rest all will get skipped.
@anamika
‘pbs_benchpress input’ is focused on list of hostnames & their type of installations. - indicates the number of nodes available to run the tests
‘test requirements information’ is focused on the daemons necessary on each of its needed nodes in order to test the feature – this indicates the number of nodes necessary to run the test
Having same interface for both would cause the same node name mentioned multiple times wrt daemons in -p input in pbs_benchpress. Instead I feel it is better to mention hosts and their installation types or detect the installation types of the hostnames passed. PTL internally will be aware of daemons available with the installation type and hence can switch on or off the daemons based on test’s requirements.
@nithinj
Yes, Interface 4 is for the same – as suggested by Hiren & Sanidhya, I name it as --get-required-hosts.
Regarding the list of tests that will be skipped due to the insufficiency of hosts; will have to look into the internal design for this.
@bhroam
I removed the option --run-on-single-node and set this behavior for below two cases:
When no hostname is specified in ‘–hosts‘ in pbs_benchpress
When single hostname is specified in ‘–hosts‘ in pbs_benchpress
@visheshh
I initially thought of having rpm types of options in pbs_benchpress input (server rpm, execution rpm & client rpm); but had added comms due to ease of specification. Yes, to answer your question, we would have to mention it multiple times in the previous format, i.e. if we are bifurcating as per daemons (instead of installation rpm types).
I have now changed the interface as common option ‘–hosts‘ and PTL detecting the type of installation.
@hirenvadalia
I did think about ‘–hosts’ which was part of your initial suggestion on this RFE; but got diverted towards daemons specification since the challenge in single list of hostnames specification is the sequence of node types which might vary in the user input. For example in case where the requirements expects comm on host3, but what if 3rd hostname specified in pbs_benchpress is an execution rpm installed node.
This proposal works in case where all nodes have server rpm installation. Or, in case where PTL detects the rpm type installed on each node and accordingly matches to the requirements. I updated the design as per the latter case.
Also, regarding the suggestion of required hosts being part of json output; it can be looked upon during implementation of --get-required-hosts.
Thanks @saritakh for updating design. few minor comments:
Not sure but should we allow range in @requirements() arg also? like @requirements(host1=SERVER, host_2_5=MOM, host6=COMM) where host_2_5 means host[2-5]
In interface 2, we should explicitly mention that while using range and port and/or config path is given then it will be same for all host in given range
In interface 3, we should also mention about --tags along with -t
in interface 4, we should not hard code username but use users from pbs_testusers.py
Thank you @hirenvadalia,
I completely agree on point 1 without which this design would have been incomplete. I have updated all your comments. Please review and let me know your further comments.