Add a mock run option to pbs_mom for testing

Hi all,

I’m proposing adding a new option to pbs_mom for testing purposes, especially performance testing. Please review the doc and provide feedback:
https://pbspro.atlassian.net/wiki/spaces/PD/pages/1555726337/Add+a+mock+run+option+to+pbs+mom+for+testing

Thanks,
Ravi

Overall I like this idea. It takes 17s for the scheduler to run 10k jobs. It takes the mom 10m to put them all into the running state. We’re doing scheduler testing, so we don’t really care about the actual running of the jobs. This will streamline that process.

I have two comments:

  1. It is possible to update the walltime through qalter and this does make it to the mom. Would you take this into account? I’m not sure we need to care.
  2. Some PTL tests do actually check for certain things the mom sends back like the session ID or resources_used.walltime. I think you’ll still have to do these updates. In the case of session ID, I’d just make something up. You can easily handle resources_used.walltime. Just update it when the mom would normally walk the proc table. The other resources you can just assume 100% cpu used.

Bhroam

Thanks for the review Bhroam.

I hadn’t thought about that. This is kind of use-case territory … I’m not sure that there’s a need for it. Maybe we can do this in a future version if we feel that it’s needed? For now, I’ll mention in the doc that it won’t have any effect.

I actually didn’t have PTL in mind at all when thinking about this feature … I have a feeling that making this work with PTL will be a lot more effort than just setting these attributes … we’ll have to add an option to pbs_benchpress to start mom to do mock run, probably make some changes to the revert functions, cleanup_jobs() actually does kill -9 on the session id of the job, I’m sure there will be other things to sort out as well. I’m not entirely sure that it’s worth it, PTL is not really the best way to do performance testing. What do you think?

@hirenvadalia requesting you to provide inputs as well, thanks.

Just adding to what @bhroam said about PTL + Session ID in mock mode:
What about job’s output and error files? for example in case of direct output/error PTL may expect job output files to be created and have some data in it. Or lets say if PTL test is submitting job after which it except some data from job’s output file then?

Also, Should we care about mom hooks in mock mode? If not then it will make mom further lightweight… no?

Interesting thought, I hadn’t considered hooks. Are there any mom hooks which are run by default? i don’t remember seeing anything about hooks during performance profiling of pbs_mom, that’s probably why I didn’t consider it. As such, users have control, if they want a thin mom then they should just not create mom hooks, so i feel that it’s ok if we don’t handle them. Let me know what you think

startup and periodic mom hooks?

True, I think for now we can safely ignore mom hooks… If in future, if required we can change…

Alright, sounds good, let me know if you have any more comments.

@bhroam please let me know your thoughts about PTL.

@agrawalravi90 I think PTL is important because you can write your performance test in PTL. Running manual tests are nice and all, but writing them in PTL makes it much easier to run them multiple times.

We probably should just document the limitations of this mock mode so we can write our tests appropriately. We just will know that X, Y, and Z won’t happen, so we won’t expect them to.

Bhroam

Ok, sounds good, I’ve added a “caveats” section to the page, please let me know if I should add any more information.

I personally would rather see you make a small effort for the resources_used. Since you are going to circumvent the walking of the proc table, you just need to report back some slightly bogus numbers. resources_used.walltime can be correct. For everything else, just report back what was passed in.

This could be a future enhancement though.

Bhroam

So, the best that we can do is set them to corresponding Resource_List values, if the submitter provided them to us. But is that actually any better than leaving them set to 0 ? Job submitters know that mom is doing a mock run, the job is not actually being run, so would they really care to expect these values?

On second thoughts, it might be useful for analytics, so ya, should probably set them if possible.

Ok, I’ve made the change, please let me know if it looks okay @bhroam

My last comment is that resources_used.walltime is actually calculated inside the mom. You probably can just send that back without much problem. This way you can know how long the job has left. The scheduler actually uses it.

Bhroam

Got it, thanks, will definitely do that. Do let me know if you have other suggestions, this is my first time messing around with mom code :slight_smile: