Add a mock run option to pbs_mom for testing

agrawalravi90 · February 26, 2020, 10:04pm

Hi all,

I’m proposing adding a new option to pbs_mom for testing purposes, especially performance testing. Please review the doc and provide feedback:
https://pbspro.atlassian.net/wiki/spaces/PD/pages/1555726337/Add+a+mock+run+option+to+pbs+mom+for+testing

Thanks,
Ravi

bhroam · February 27, 2020, 2:44am

Overall I like this idea. It takes 17s for the scheduler to run 10k jobs. It takes the mom 10m to put them all into the running state. We’re doing scheduler testing, so we don’t really care about the actual running of the jobs. This will streamline that process.

I have two comments:

It is possible to update the walltime through qalter and this does make it to the mom. Would you take this into account? I’m not sure we need to care.
Some PTL tests do actually check for certain things the mom sends back like the session ID or resources_used.walltime. I think you’ll still have to do these updates. In the case of session ID, I’d just make something up. You can easily handle resources_used.walltime. Just update it when the mom would normally walk the proc table. The other resources you can just assume 100% cpu used.

Bhroam

agrawalravi90 · February 27, 2020, 4:46am

Thanks for the review Bhroam.

I hadn’t thought about that. This is kind of use-case territory … I’m not sure that there’s a need for it. Maybe we can do this in a future version if we feel that it’s needed? For now, I’ll mention in the doc that it won’t have any effect.

I actually didn’t have PTL in mind at all when thinking about this feature … I have a feeling that making this work with PTL will be a lot more effort than just setting these attributes … we’ll have to add an option to pbs_benchpress to start mom to do mock run, probably make some changes to the revert functions, cleanup_jobs() actually does kill -9 on the session id of the job, I’m sure there will be other things to sort out as well. I’m not entirely sure that it’s worth it, PTL is not really the best way to do performance testing. What do you think?

agrawalravi90 · February 27, 2020, 4:46am

@hirenvadalia requesting you to provide inputs as well, thanks.

hirenvadalia · February 27, 2020, 4:55am

Just adding to what @bhroam said about PTL + Session ID in mock mode:
What about job’s output and error files? for example in case of direct output/error PTL may expect job output files to be created and have some data in it. Or lets say if PTL test is submitting job after which it except some data from job’s output file then?

hirenvadalia · February 27, 2020, 4:57am

Also, Should we care about mom hooks in mock mode? If not then it will make mom further lightweight… no?

agrawalravi90 · February 27, 2020, 9:40pm

Interesting thought, I hadn’t considered hooks. Are there any mom hooks which are run by default? i don’t remember seeing anything about hooks during performance profiling of pbs_mom, that’s probably why I didn’t consider it. As such, users have control, if they want a thin mom then they should just not create mom hooks, so i feel that it’s ok if we don’t handle them. Let me know what you think

hirenvadalia · February 28, 2020, 4:39am

startup and periodic mom hooks?

True, I think for now we can safely ignore mom hooks… If in future, if required we can change…

agrawalravi90 · March 2, 2020, 1:35am

Alright, sounds good, let me know if you have any more comments.

@bhroam please let me know your thoughts about PTL.

bhroam · March 2, 2020, 9:24pm

@agrawalravi90 I think PTL is important because you can write your performance test in PTL. Running manual tests are nice and all, but writing them in PTL makes it much easier to run them multiple times.

We probably should just document the limitations of this mock mode so we can write our tests appropriately. We just will know that X, Y, and Z won’t happen, so we won’t expect them to.

Bhroam

agrawalravi90 · March 2, 2020, 10:49pm

Ok, sounds good, I’ve added a “caveats” section to the page, please let me know if I should add any more information.

bhroam · March 3, 2020, 12:17am

I personally would rather see you make a small effort for the resources_used. Since you are going to circumvent the walking of the proc table, you just need to report back some slightly bogus numbers. resources_used.walltime can be correct. For everything else, just report back what was passed in.

This could be a future enhancement though.

Bhroam

agrawalravi90 · March 3, 2020, 1:01am

So, the best that we can do is set them to corresponding Resource_List values, if the submitter provided them to us. But is that actually any better than leaving them set to 0 ? Job submitters know that mom is doing a mock run, the job is not actually being run, so would they really care to expect these values?

agrawalravi90 · March 3, 2020, 1:10am

On second thoughts, it might be useful for analytics, so ya, should probably set them if possible.

agrawalravi90 · March 3, 2020, 1:14am

Ok, I’ve made the change, please let me know if it looks okay @bhroam

bhroam · March 3, 2020, 1:38am

My last comment is that resources_used.walltime is actually calculated inside the mom. You probably can just send that back without much problem. This way you can know how long the job has left. The scheduler actually uses it.

Bhroam

agrawalravi90 · March 3, 2020, 1:50am

Got it, thanks, will definitely do that. Do let me know if you have other suggestions, this is my first time messing around with mom code

Topic		Replies	Views
[WIP] "Mock run" option for scheduler Developers	6	1210	November 17, 2020
PP-239 and PP-659: Decorator to skip PTL tests on cpuset mom; Specifying default test case time out value while running PTL tests Developers	9	1469	March 23, 2017
Add support in PTL to speed up deletion of large number of jobs Developers	11	951	February 6, 2019
Design for a supported way to change default setup in PTL Developers	49	2165	September 21, 2018
PP-719: Enhance setUp in PTL specifically for Cray platforms Developers	35	3222	March 23, 2018

Add a mock run option to pbs_mom for testing

Related topics