PBS NEC SX-Aurora Integration

Hi All,

This design is about the proposed approach and introducing new interfaces for PBS NEC Sx-Aurora Integration.

Please review it and provide your feedback.

Thanks,
Sujata Patnaik

I’m a bit confused about what’s the actual enhancement to PBS that you are proposing. The interfaces that you’ve documented seem to all be resources, are you proposing to add new built-in resources to PBS? Why not just let the user create custom resources as desired?

Will there be any behavioral change to PBS with this?

1 Like

No behavioral changes to the core of PBS, but this depicts the resources that will be created and maintained by the SX Aurora specific mom hook. Thus, it is a description of how PBS will work on NEC SX Aurora architecture.

1 Like

@sujatapatnaik52 i think you need to document the format of the NEC_PROCESS_DIST switch

1 Like

Hey @sujatapatnaik52 ,

Overall design document is good.
I feel you need to mention some more details for the interface nves and nchas, like it’s type(int/boolean), access mode(readonly/writable) and about the default value.
I think you are introducing new environment variable PBS_NODEFILE_VE, but it looks like it is already available.

1 Like

Thank you @agrawalravi90 for the review.

Thank you @subhasisb for reviewing. Updated the format. Please take a look.

Hi @riyazhakki,

Thank you for the review. The design page is updated now. Please take a look.

I believe PBS has an environment variable called PBS_NODEFILE, I am not sure if I am aware of the environment variable PBS_NODEFILE_VE.

Thank you @sujatapatnaik52, for adding the details.

I believe PBS has an environment variable called PBS_NODEFILE, I am not sure if I am aware of the environment variable PBS_NODEFILE_VE.

I mean, from the design document I feel like PBS_NODEFILE_VE variable is already present. If we are introducing it, then I think it should be mentioned as an another interface.

Isn’t a vector engine essentially an accelerator, like a GPU? Could we treat VEs more generically?

What, exactly, will be consuming the contents of PBS_NODEFILE_VE? This will likely be some form of MPI or other parallel programming infrastructure. It would be useful to note whether the implementation is open or proprietary. If it is open, there should be corresponding documentation to reference.

What is the intended approach for implementation? Will this be done in a hook? Will the core PBS code need to be altered?

Got it, thanks for clarifying!

@riyazhakki I got your point now. Thanks for clarifying it. Added a new interface for PBS_NODEFILE_VE. Please review the updated one.

Thank you @mkaro for reviewing the design.

Yes, vector engines are accelerators. PBS will treat vector engines as a single entity similar to GPU’s.

PBS_NODEFILE_VE will be used by NEC MPI. I have updated the document with additional details. Please look at it.

This will be done in mom hooks. The core PBS code won’t be changed.

Very useful update. Thank you @sujatapatnaik52

1 Like