This example uses PMIx (but it also works with PMI2) to handle non-ABI-compatible MPI variants between the host and container, relying solely on the container’s MPI and PMIx libraries for bitwise reproducibility (no host MPI bindings).
What would be the equivalent command for PBS?
Additionally, have some of you already experience with that, and if yes, how did it perform compared to running the same MPI application directly on the host (without Singularity)?
Any insights, example scripts, or configuration tips (e.g., PBS directives, PMIx setup, InfiniBand optimization) would be greatly appreciated.
pbsdsh(1B) PBS Professional pbsdsh(1B)
NAME
pbsdsh - distribute tasks to vnodes under PBS
SYNOPSIS
pbsdsh [-c <copies>] [-s] [-v] [-o] -- <program> [<program args>]
pbsdsh [-n <vnode index>] [-s] [-v] [-o] -- <program> [<program args>]
pbsdsh --version
DESCRIPTION
The pbsdsh command allows you to distribute and execute a task on each of the vnodes assigned to your job by executing (spawning) the application on each vnode. The pbsdsh command uses the PBS Task
Manager, or TM, to distribute the program on the allocated vnodes.
When run without the -c or the -n option, pbsdsh will spawn the program on all vnodes allocated to the PBS job. The spawns take place concurrently; all execute at (about) the same time.
Note that the double dash must come after the options and before the program and arguments. The double dash is only required for Linux.
The pbsdsh command runs one task for each line in the $PBS_NODEFILE. Each MPI rank gets a single line in the $PBS_NODEFILE, so if you are running multiple MPI ranks on the same host, you still get
multiple pbsdsh tasks on that host.
Example
The following example shows the pbsdsh command inside of a PBS batch job. The options indicate that the user wants pbsdsh to run the myapp program with one argument (app-arg1) on all four vnodes
allocated to the job (i.e. the default behavior).
#!/bin/sh
#PBS -l select=4:ncpus=1
#PBS -l walltime=1:00:00
pbsdsh ./myapp app-arg1
OPTIONS
-c copies
The program is spawned copies times on the vnodes allocated, one per vnode, unless copies is greater than the number of vnodes. If copies is greater than the number of vnodes, it wraps
around, running multiple instances on some vnodes. This option is mutually exclusive with -n.
-n <vnode index>
The program is spawned only on a single vnode, which is the vnode index -th vnode allocated. This option is mutually exclusive with -c.
-o No obit request is made for spawned tasks. The program does not wait for the tasks to finish.
-s Te program is run in turn on each vnode, one after the other.
-v Produces verbose output about error conditions and task exit status.
--version
The pbsdsh command returns its PBS version information and exits. This option can only be used alone
OPERANDS
program
The first operand, program , is the program to execute. The double dash must precede the program under Linux.
program args
Additional operands, program args , are passed as arguments to the program.
STANDARD ERROR
The pbsdsh command writes a diagnostic message to standard error for each error occurrence.
SEE ALSO
qsub(1B), tm(3).
Local 6 May 2020 pbsdsh(1B)
Thanks @adarsh but I do not have access to a machine with PBS at the moment, I am just trying to figure out how to “transpose” what I did with SLURM and containers to leverage the PMI (process management interface) if that is possible at all, not just submit massively parallel jobs: there has to be message passing between them
With mpiexec -np $NP singularity exec ... the MPIs on the host and inside the container have to be ABI compatible, otherwise it fails, whereas with srun --mpi=pmix singularity ... this is not necessary and they only have to share the same PMI (PMI2, PMIx, etc.)