PP-1281: New decorator in PTL using which user can provide cluster information required for a test

@hirenvadalia, I have updated the document accordingly. Please review

@borlesanket design looks good to me.

@bhroam, @anamika, do you have any comments on updated doc or should I consider it as sign off.

@borlesanket Sorry for the confusion. I thought I already signed off. Yes, it looks good to me.

sorry for the delay in reply @borlesanket. A minor comment and rest looks good.

  • Kindly update following attributes as well

numservers=1 to num_servers/servers_count
nummoms=1 to num_moms/moms_count
numcomms=1 to num_comms/comms_count
numclients=1 ro num_clients/clients_count

@anamika, I have updated the document accordingly. Please have a look.

Thanks @borlesanket. Looks good to me.

Hi All,

I have taken up the implementation of this feature. During implementation, I had to update few design aspects whose details are given below:

Summary of changes:

  1. REQUIREMENT updated as REQUIREMENTS_KEY
  2. REQUIREMENT_INFO is updated as REQUIREMENTS_INFO
  3. removed unittest_skip & unittest_skip_why way of skipping test execution
  4. Updated requirements decorator details with SkipTest() use for skipping test execution
  5. Added note regarding specifying num_clients greater than one in requirements

Updates 1 & 2:
“REQUIREMENT” seemed singular word whose data is actually multiple key-value pairs, hence udpated it as plural and appeneded “KEY” to it.

Update 3:
The test execution or skip decision has to be done at the runner context where the test case name is identified. At this point, the flat unittest_skip is not being honored by nose framework. Hence this method of skipping test using unittest_skip flag did not work. Hence SkipTest() method had to used to skip test execution in case of requirements not matching the param list data of pbs_benchpress.

Update 4:
Validation of both the test case requirements data and the param list data of pbs_benchpress is not done inside the decorator, but at the runner context.

Update 5:
Currently PTL supports specification of a single host for “client” parameter, i.e. name of the host on which the PBS client commands are to be run from. This design supports specification of requirement of multiple clients for a given test case; which right now cannot be supported. Hence I have logged a ticket given below to support multiple client specification for pbs_benchpress; after which this requirements implementation fully functional in terms of number of clients. So, as of now test execution of any test case that specifies num_clients as greater than one will be skipped.

PP-1327: Support specification of multiple PBS client hosts in PTL

Please review the same and let me know of any comments on these design changes.

@bhroam, @anamika, @hirenvadalia, @borlesanket,
Gentle reminder!!
Please review the design updates and let me know of any comments or suggestions.

Thanks,
Sarita

@saritakh thanks for updating the document.

    • no_comm_on_mom=True (Which will imply that comm should be on different host than mom)
      by default we assume single node installation. should not above be false for that? why we have default requirement for comm to be on a separate host?
  • before jumping into design I would add some details about the decorator itself. For example, this decorator can be defined at test case or test suite level. Then I would mention when will it get called during test execution.

  • when I read your first point about REQUIREMENTS_KEY I lost it until I read further details. Clearly I think that should be removed or moved to later.

  • third point about wrapper method is also very confusing and not clear. I think what you are trying to say is "REQUIREMENTS_KEY dictionary will be defined during the initialization of test object inside decorator function in utils/pbs_testsuite.py. am I right? I do not think you need to mention that whatever is not being passed by user will be set to default. it is clear in the next point.

  • for “PTLTestInfo._gen_ts_tree()” function, is keyword “requirements” predefined? if yes then add it explicitly.

  • who sets numnodes? it doesn’t affect your design but I am just curious.

  • if we set requirements decorator at test suite level then this information will be passed to test case. I believe this is what you are trying to say at following but it is confusing.
    default requirements --overridden by–> test suite level requirements data --overridden by–>test case level requirements data

  • it is not clear how one can get the attributes from REQUIREMENTS_KEY. can you add an example or pseudo code for more clarity?

  • Also I believe platform should also be part of requirements. If my platform not matches then it should get skipped. by default platform could be unset/ALL.

Thanks @anamika, for your valuable inputs. I have updated the design with the same; please review and let me know of any comments; please find my replies below:

no_comm_on_mom=True (Which will imply that comm should be on different host than mom)
by default we assume single node installation. should not above be false for that? why we have default requirement for comm to be on a separate host?

Sarita ==> This refers to mom only node and not the mom on the server node. I will update the satement as "comm should be on different host than the host which has mom (i.e. a mom on non-server host)"

before jumping into design I would add some details about the decorator itself. For example, this decorator can be defined at test case or test suite level. Then I would mention when will it get called during test execution.

Sarita ==> Updated

@requirements decorator can be specified at test case or test suite level. This decorator function is called before the test setup of the test case.

when I read your first point about REQUIREMENTS_KEY I lost it until I read further details. Clearly I think that should be removed or moved to later.

Sarita ==> Updated, I preferred to remove it since it is not a user exposed characteristic.

third point about wrapper method is also very confusing and not clear. I think what you are trying to say is "REQUIREMENTS_KEY dictionary will be defined during the initialization of test object inside decorator function in utils/pbs_testsuite.py. am I right? I do not think you need to mention that whatever is not being passed by user will be set to default. it is clear in the next point.

Sarita ==> Updated

for “PTLTestInfo._gen_ts_tree()” function, is keyword “requirements” predefined? if yes then add it explicitly.

No, ‘requirements’ is a new key added to the json dictionary generated using --gen-ts-tree option to info command which functions with PTLTestInfo._gen_ts_tree() method. Updated the same.

who sets numnodes? it doesn’t affect your design but I am just curious.

Sarita ==> PTL ‘numnodes’ is an existing tag name whose value when applied to a test case, indicates the number of nodes needed to run the test case. This is an existing feature.

if we set requirements decorator at test suite level then this information will be passed to test case. I believe this is what you are trying to say at following but it is confusing.
default requirements --overridden by–> test suite level requirements data --overridden by–>test case level requirements data

Sarita ==> Yes, I reworded the same to depict exact effectiveness of requirements at test case

it is not clear how one can get the attributes from REQUIREMENTS_KEY. can you add an example or pseudo code for more clarity?

Sarita ==> It is set using setattr() method in the requirements decorator and got using python’s getattr() method wherever needed. For example it is the same way that we get “_testMethodName” from a test object to get the test case name.

Also I believe platform should also be part of requirements. If my platform not matches then it should get skipped. by default platform could be unset/ALL.

Sarita ==> This is a future enhancement to this decorator. Right now this design includes the cluster information only from the installation perspective. Along with this, there is an issue that I am facing with compatbility of @skipOnCpuSet and @requirements decorator; for which I will be logging a ticket to work upon it.

Hi All,

I have updated the design addressing Anamika’s comments. Please re-review the same and let me know of any comments or suggestions.

Thanks,
Sarita

Thanks for explaining Sarita. it looks good to me.
nit: “can make use this” should be “can make use of this”

I still prefer no_comm_on_mom to be false as default since most of our PTL tests have mom and comm on same host. after this fix every multi node tests needs to be either updated to have an additional node or no_comm_on_mom set to false. right?

I’m fine with the changes. Thanks for keeping the document up to date.

Thanks @anamika,
I updated the line “can make use of this”.
Regarding your point on no_comm_on_mom; I feel no_comm_on_mom=True to be correct as default, since it is for non-server hosts. Let me try to give you an example:

Consider a generic multi node PBS cluster setup:

Node 1: server,sched,mom1 & comm
Node 2: mom2
Node 3: mom3

Case 1: In we set no_comm_on_mom=True (considering the non server hosts Node 2 & Node 3) - it indicates that mom2 & mom3 hosts are plain mom only installations

Case 2: If we set no_comm_on_mom=False (considering the server host i.e. Node 1); it indicates that mom2 & mom3 hosts might have mom and comm installations on them.

Now consider an external entity like TH that would set up the test cluster needed for this test case:
In case 1 TH would be sure to install mom only on Node 2 & Node 3
In case 2 TH would not be sure if Node 2 & Node 3 would be mom+comm or mom only; it has to check the num_comms value in order to decide which of these nodes will have mom+comm installations.

no_comm_on_mom if set to True indicates all non-server moms are surely mom only installations
no_comm_on_mom if set to False indicates some of the non-server moms are mom+comm installations

With current default value of True, in all our existing multi node test cases, we do not need to pass this flag since the default value of True holds good for all the non-server nodes since they have mom only installations.
The other case if we set False as default value, we need to specifically pass “no_comm_on_mom=True” to all our multi node cases which usually fall under above example.

Please let me know what you feel on this and request you to bear the long example; since I was also re-assuring myself with this explanation.

Thanks for explaining Sarita. can you add an example for the same as well?

Thanks @anamika, I added third example as the same. Please review and let me know of any comments.

Thanks @saritakh. looks good to me.

@All,
During review of Pull Request; a question got raised on which server would the flags be applicable in case of a multi-server setup. Currently the data is applicable to all the servers on a multiple server setup. Hence, I updated the same note in the design page. Kindly have a look and let me know of any comments on this line:
" * Currently this data including the flags no_mom_on_server, no_comm_on_server & no_comm_on_mom are applicable to all the servers in case of multiple server setup"

Thanks @saritakh. BTW, do we have the design available for multiple servers? If yes then we might want to look into that otherwise what you propose looks good.