I wrote these instructions about a year ago. Aside from the URLs (that were accurate at the time) it should still work. HTH
Overview
This tutorial will accomplish three things:
- Build PBS Professional (OSS release 19.1.1 beta 1 for this example)
- Build OpenMPI with support for PBS Professional task manager interface
- Build and run some sample MPI applications
OpenMPI will be installed under /opt/openmpi
PBS Pro will be installed under /opt/pbs
PTL will be installed inder /opt/ptl
Prerequisites
- Two VMs with two virtual CPUs each (pbs-server and mom-2 for this example)
- Root access on both VMs (needed for installing PBS Pro and OpenMPI)
- /opt does not squash UID for SUID binaries
- VMs configured to communicate with each other
- Same OS on both VMs (to prevent building everything twice)
- Intenet access to download source code
- Build dependencies for PBS Pro and OpenMPI are installed on primary VM
- Installation dependencies for PBS Pro and OpenMPI are installed on both VMs
Setup
Any existing PBS Pro or OpenMPI packages should be uninstalled.
rpm -qa | grep pbs
rpm -qa | grep openmpi
Use yum, zypper, apt-get, etc. to uninstall these packages. Check the
contents of /opt to ensure the pbs and openmpi directories are not present.
Also remove /etc/pbs.conf and the PBS_HOME (/var/spool/pbs) direcory.
PBS Pro and OpenMPI distribution packages will be built under ~/work on the
primary VM. The RPMs will be built in the standard rpmbuild location. Note
that these directories may already exist.
mkdir ~/work
mkdir ~/rpmbuild ~/rpmbuild/BUILD ~/rpmbuild/BUILDROOT ~/rpmbuild/RPMS \
~/rpmbuild/SOURCES ~/rpmbuild/SPECS ~/rpmbuild/SRPMS
Build PBS Professional
cd ~/work
curl -so - https://codeload.github.com/PBSPro/pbspro/tar.gz/v19.1.1beta1 | \
gzip -cd | tar -xf -
cd pbspro-19.1.1beta1
./autogen.sh
[output omitted]
./configure PBS_VERSION='19.1.0' --prefix=/opt/pbs
[output omitted]
make dist
[output omitted]
cp pbspro-19.1.0.tar.gz ~/rpmbuild/SOURCES
cp pbspro.spec ~/rpmbuild/SPECS
cd ~/rpmbuild/SPECS
rpmbuild -ba --with ptl pbspro.spec
Install PBS Professional
This example is run on CentOS using yum. Adjust accordingly for the OS.
cd ~/rpmbuild/RPMS/x86_64
sudo yum install pbspro-server-19.1.0-0.x86_64.rpm
[output omitted]
Optionally, install PTL:
$ sudo yum install pbspro-ptl-19.1.0-0.x86_64.rpm
- Set PBS_START_MOM=1 on the primary VM
- Start PBS Pro on the primary VM
- Copy the pbspro-execution RPM to the secondary VM and install it
- Start PBS Pro on the secondary VM
- Use qmgr to add the secondary VM to the complex
- Confirm that the secodary VM is
Build OpenPMI
The current release as of January 4, 2019 is OpenMPI 4.0.0
cd ~/work
curl -sO https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.0.tar.gz
$ tar -xOf openmpi-4.0.0.tar.gz */contrib/dist/linux/openmpi.spec | \
sed ‘s/$VERSION/4.0.0/’ | sed 's/EXTENSION/gz/' >openmpi.spec
cp openmpi.spec ~/rpmbuild/SPECS
cp openmpi-4.0.0.tar.gz ~/rpmbuild/SOURCES
cd ~/rpmbuild/SPECS
$ rpmbuild -D ‘configure_options --without-slurm --with-tm=/opt/pbs’ \
-D ‘install_in_opt 1’ -ba openmpi.spec
Install OpenMPI
This example is run on CentOS using yum. Adjust accordingly for the OS.
cd ~/rpmbuild/RPMS/x86_64
sudo yum install openmpi-4.0.0-*.rpm
[output omitted]
Add profile scripts for OpenMPI:
cd ~/work
cat <<‘EOF’ >openmpi.sh
PATH={PATH}:/opt/openmpi/4.0.0/bin
MANPATH={MANPATH}:/opt/openmpi/4.0.0/man
EOF
sudo cp openmpi.sh /etc/profile.d/openmpi.sh
cat <<‘EOF’ >openmpi.csh
setenv PATH {PATH}:/opt/openmpi/4.0.0/bin
setenv MANPATH {MANPATH}:/opt/openmpi/4.0.0/man
EOF
$ sudo cp openmpi.csh /etc/profile.d/openmpi.csh
Copy the RPM to the secondary VM and install it there as well.
Copy the /etc/profile.d/openmpi.* scripts to the secondary VM.
STOP! STOP! STOP! STOP! STOP! STOP! STOP!
Before you proceed, log out and log back in. This will cause your login shell
to process the new files in /etc/profile.d and setup your PATH and MANPATH
correctly. Once you have logged back in, ensure your PATH and MANPATH contain
references to the appropriate directories. This may include PTL if it was
installed.
As an alternative, you may source the files directly from your login shell
without logging out.
====================================================================
Compile and Run a Job with OpenMPI
cd ~/work
cat <<‘EOF’ >>hello_mpi.c
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <limits.h>
#include <mpi.h>
int main(int argc, char* argv[])
{
int rank, size;
char hostname[HOST_NAME_MAX];
void *appnum;
void *univ_size;
char *appstr, *unistr;
int flag;
char *envar;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_get_attr(MPI_COMM_WORLD, MPI_APPNUM, &appnum, &flag);
if (NULL == appnum) {
asprintf(&appstr, “UNDEFINED”);
} else {
asprintf(&appstr, “%d”, (int)appnum);
}
MPI_Comm_get_attr(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, &univ_size, &flag);
if (NULL == univ_size) {
asprintf(&unistr, “UNDEFINED”);
} else {
asprintf(&unistr, “%d”, (int)univ_size);
}
gethostname(hostname, sizeof(hostname));
envar = getenv(“OMPI_UNIVERSE_SIZE”);
printf(“Rank:%d/%d Host:%s App#:%s MPI_UNIVERSE_SIZE:%s OMPI_UNIVERSE_SIZE:%s\n”,
rank, size, hostname, appstr, unistr, (NULL == envar) ? “NULL” : envar);
MPI_Finalize();
return 0;
}
EOF
mpicc -o hello_mpi hello_mpi.c
ssh mom-2 mkdir work
scp hello_mpi mom-2:work
cat <mpijob
#PBS -l select=4:ncpus=1:mem=64m
#PBS -j oe
mpirun ~/work/hello_mpi
EOF
qsub mpijob
8.pbs-server
cat mpijob.o8
Rank:0/4 Host:pbs-server App#:0 MPI_UNIVERSE_SIZE:4 OMPI_UNIVERSE_SIZE:4
Rank:2/4 Host:mom-2.elusive.name App#:0 MPI_UNIVERSE_SIZE:4 OMPI_UNIVERSE_SIZE:4
Rank:3/4 Host:mom-2.elusive.name App#:0 MPI_UNIVERSE_SIZE:4 OMPI_UNIVERSE_SIZE:4
Rank:1/4 Host:pbs-server App#:0 MPI_UNIVERSE_SIZE:4 OMPI_UNIVERSE_SIZE:4
Mom logs from pbs-server (where ranks 0 and 1 were run):
01/07/2019 14:21:06;0008;pbs_mom;Job;8.pbs-server;nprocs: 315, cantstat: 0, nomem: 0, skipped:
0, cached: 0
01/07/2019 14:21:06;0008;pbs_mom;Job;8.pbs-server;Started, pid = 120710
01/07/2019 14:21:07;0080;pbs_mom;Job;8.pbs-server;task 00000001 terminated
01/07/2019 14:21:07;0008;pbs_mom;Job;8.pbs-server;Terminated
01/07/2019 14:21:07;0100;pbs_mom;Job;8.pbs-server;task 00000001 cput= 0:00:00
01/07/2019 14:21:07;0008;pbs_mom;Job;8.pbs-server;kill_job
01/07/2019 14:21:07;0100;pbs_mom;Job;8.pbs-server;pbs-server cput= 0:00:00 mem=424kb
01/07/2019 14:21:07;0100;pbs_mom;Job;8.pbs-server;mom-2.elusive.name cput= 0:00:00 mem=0kb
01/07/2019 14:21:07;0008;pbs_mom;Job;8.pbs-server;no active tasks
01/07/2019 14:21:07;0100;pbs_mom;Job;8.pbs-server;Obit sent
Mom logs from mom-2 (where ranks 2 and 3 were run):
01/07/2019 14:21:06;0008;pbs_mom;Job;8.pbs-server;JOIN_JOB as node 1
01/07/2019 14:21:06;0008;pbs_mom;Job;8.pbs-server;task 20000001 started, orted
01/07/2019 14:21:07;0080;pbs_mom;Job;8.pbs-server;task 20000001 terminated
01/07/2019 14:21:07;0008;pbs_mom;Job;8.pbs-server;KILL_JOB received
01/07/2019 14:21:07;0008;pbs_mom;Job;8.pbs-server;kill_job
01/07/2019 14:21:07;0100;pbs_mom;Job;8.pbs-server;task 20000001 cput= 0:00:00
01/07/2019 14:21:07;0008;pbs_mom;Job;8.pbs-server;DELETE_JOB received
01/07/2019 14:21:07;0008;pbs_mom;Job;8.pbs-server;kill_job
Notes:
- If the user’s home directory were shared across both VMs (e.g. via NFS)
there would have been no need to create the work directory or copy the
hello_mpi binary to mom-2.