How to launch MPMD with partial oversubscription?

Hi,

PBS version : 2021.1.3
OpenMPI : 5.0.3

I’d like to do a partial oversubscription with MPMD, like this

  1. Request 40 trunks in total
  2. First group of 32 trunks with 128 cores per trunk, 1 MPI rank per core, 1 OMP threads per MPI rank
  3. Second group of 8 trunks with 128 cores per trunk, 64 MPI rank per core, 1 OMP threads per MPI rank

Tried to launch MPMD with these combinations

1. mpiexec  -n 4096 A :  -n 512 --npernode 64 B
2. mpiexec  -n 4096 A :  -n 512 --map-by ppr:64:node B

neither did it work.

Looked like OpenMPI just launch B with 128 MPI processes per core instead of 64. Wondering if it’s due to the lists of nodes in PBS_NODEFILE, as it will get a lists of 128x40 nodes in the PBS_NODEFILE if using select=40:ncpus=128:mpiprocs=128.

Just wondering if there was an alternative approach, such as using placement paraments or by using PBS select directives to get expected PBS_NODEFILE, other then using a pre-launch wrapper script to amend the PBS_NODEFILE.

Any idea ?

Thanks for your time

Regards

Jerry

Could you please try this
qsub -l select=32:ncpus=128:mpiprocs=128+8:ncpus=128:mpiprocs=128
Sorry , do not understand what trunks mean here, i thought it is chunk
Also referred this: https://www.intel.com/content/www/us/en/docs/mpi-library/developer-guide-linux/2021-6/mpmd-launch-mode.html

I think it is better to use Job Arrays for MPMD or SPMD kind of workload., the script needs to be customised.

adarsh,

Thanks for your input. It works like a charm.

With this

 -l select=32:ncpus=128:mpiprocs=128:ompthreads=1:mem=440GB+8:ncpus=128:mpiprocs=64:ompthreads=2:mem=440GB

Got a list of 128x32+64x8 nodes and “mpiexec -n 4096 A : -n 512 B” worked as expected.

Regards

Jerry

1 Like

Nice one . Thank you Jerry