Openmpi support

I have one question related to openmpi support.
My cluster is setup by the below steps:

  1. install openmpi by “sudo yum install openmpi openmpi-devel”.
  2. module load mpi/openmpi-x86_64
  3. install pbs server/execution rpm package in node1,node2.

And according to, I checked the openmpi and pbs by

[test@pbspro-server ~]$ ompi_info | grep ras
MCA ras: gridengine (MCA v2.0.0, API v2.0.0, Component v1.10.7)
MCA ras: loadleveler (MCA v2.0.0, API v2.0.0, Component v1.10.7)
MCA ras: simulator (MCA v2.0.0, API v2.0.0, Component v1.10.7)
MCA ras: slurm (MCA v2.0.0, API v2.0.0, Component v1.10.7)

Does this mean “my openmpi didn’t support PBS”? But it seemed that I can use “mpirun” in pbs job scripts.

Do I need to install openmpi by compiling the code with " --with-tm option"?

Thanks a lot!

PBS Supports openmpi, but the openmpi should be compiled with PBS TM option.

Thats correct, for this to work you need to compile openmpi with PBS TM , please follow the below

Thanks very much for the info. Is there any simple way to check if the openmpi works well with my pbs cluster?
Or is there any simple script to help verify openmpi with pbs?

Before i didn’t compile openmpi with tm option, I still could run mpirun hostname inside pbs script, so I would like to have one golden way to check after I build openmpi with tm option to see if it is working fine with pbs cluster.

1 Like

Here’s a draft tutorial I put together a couple weeks ago, intended for developers. It is a draft, so please be cautious of the commands you are running, and please ensure any critical data is backed up. It was not intended for production systems, but it should work anywhere. Please let us know if you find any problems with it. A couple things to note:

  • You do not need to create the PTL RPM (for testing) if you don’t want to
  • You may want to use the 18.1.3 release rather than the 19.1.1 beta. An issue was found with the beta where the database did not start.

1 Like
  • @mkaro suggestion - recommended

You can run openmpi with PBS Pro without compiling with PBS TM option, but it will be loose integration and the communication and process launch will be via ssh and upon job deletion (job clean up is not correct) , zombie process might be created which needs to be cleaned up manually (Otherwise, pbs_attach would also help ) and usage data respect to each of the processes is not accounted.

If you use the MPI compiled with PBS TM , then the spawned processes are tightly controlled by PBS mom, proper accounting of the usage and job cleanup is guarantied.

How to compile?

#bzip2 -d openmpi-4.0.0.tar.bz2
#tar -xvf openmpi-4.0.0.tar
#cd openmpi-4.0.0
#export LD_LIBRARY_PATH=/opt/pbs/lib:$LD_LIBRARY_PATH
#export LDFLAGS="-L/opt/pbs/lib -lpbs -lpthread -lcrypto"
#./configure --prefix=/path/to/shared/openmpi400 --with-tm=/opt/pbs/default --enable-mpi-interface-warning --enable-shared --enable-static --enable-cxx-exceptions
#make; make install

How to run ?
#PBS -N pbs-openmpi-sh
#PBS -l select=2:ncpus=4:mpiprocs=4
#PBS -l place=scatter
/opt/openmpi400/bin/mpirun -np cat $PBS_NODEFILE | wc -l /bin/hostname

1 Like

Thanks @adarsh and @mkaro a lot for the detailed info.
I have both tried your two ways to build openmpi with TM support successfully.

[test @pbspro -server bin]$ ./ompi_info | grep ras
MCA ras: loadleveler (MCA v2. 0.0 , API v2. 0.0 , Component v1. 10.7 )
MCA ras: simulator (MCA v2. 0.0 , API v2. 0.0 , Component v1. 10.7 )
MCA ras: slurm (MCA v2. 0.0 , API v2. 0.0 , Component v1. 10.7 )
MCA ras: tm (MCA v2. 0.0 , API v2. 0.0 , Component v1. 10.7 )

The problem I met is that the pbs jobs which are used to test PBS and openmpi integration have hanged.

The steps are as follows:

  1. the job scripts:

[test@pbspro-server ~]$ cat
#PBS -N pbs-openmpi-sh
#PBS -l select=2
#PBS -l place=scatter
cat PBS_NODEFILE hostnumber=(cat $PBS_NODEFILE | wc -l)
/opt/openmpi/1.10.7/bin/mpirun hostname

[test@pbspro-server ~]$ cat
#PBS -l select=2
#PBS -j oe
/opt/openmpi/1.10.7/bin/mpirun ~/hello_mpi

  1. the job submission commands:
    [test@pbspro-server ~]$ qsub

[test@pbspro-server ~]$ qsub

  1. the job status:

[test@pbspro-server ~]$ qstat -a
pbspro-server: Req’d Req’d Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
8.pbspro-server opc workq pbs-openmp 14047 2 2 – – R 00:14
9.pbspro-server opc workq 14202 2 2 – – R 00:06

And there is exception in mom_logs:

01/15/2019 13:11:28;0008;pbs_mom;Job;8.pbspro-server;JOIN_JOB as node 1
01/15/2019 13:17:56;0001;pbs_mom;Svr;pbs_mom;Connection timed out (110) in open_demux, open_demux: connect
01/15/2019 13:17:56;0001;pbs_mom;Job;8.pbspro-server;task not started, Failure orted -2
01/15/2019 13:17:56;0008;pbs_mom;Job;8.pbspro-server;no active tasks
01/15/2019 13:19:35;0008;pbs_mom;Job;9.pbspro-server;JOIN_JOB as node 1
01/15/2019 13:26:03;0001;pbs_mom;Svr;pbs_mom;Connection timed out (110) in open_demux, open_demux: connect
01/15/2019 13:26:03;0001;pbs_mom;Job;9.pbspro-server;task not started, Failure orted -2
01/15/2019 13:26:03;0008;pbs_mom;Job;8.pbspro-server;no active tasks
01/15/2019 13:26:03;0008;pbs_mom;Job;9.pbspro-server;no active tasks

And “” is one of the pbs worknodes. Have configured passwordless access from server to worknodes, worknodes to worknodes, worknodes to server.
Could you please help check why my two jobs have hanged? Thanks a lot!

Could you please update below script :

To :

#PBS -N pbs-openmpi-sh
#PBS -l select=2:ncpus=2:mpiprocs=2
#PBS -l place=scatter
/opt/openmpi/1.10.7/bin/mpirun -np `cat $PBS_NODEFILE | wc -l` /bin/hostname

The issue is DNS / Name resolution

  1. Check DNS ( forward and reverse resolution of pbs server , pbs nodes on the pbs server and from the pbs nodes )
  2. Make sure /etc/hosts is populated with pbs server and compute node names/ips/alias and exist on all the nodes of the PBS Complex.
  3. without using PBS Pro , can you run open-mpi by having a static hostfile of node names – please check this.

Sure. Will try it following your guide. Thanks so much for the patience to help me.
Still two questions:

  1. I thought this may be due to the firewall inside the cluster.
    For pbs_mom to pbs_mom, which ports do we need to add firewall port allow list?

  2. is the failure related to my openmpi installation file?

Ports : 15001 to 15004 and 17001

  • open-mpi installation location should be accessible by all the nodes in the complex.
  • /path/to/shared/openmpi400 should be accessible by all the nodes and users of that PBS Cluster
  • firewall, selinux should be disabled
  • stricthostkey checking disabled
  • qsub -I -l select=2:ncpus=2:mpiprocs=2 ( first -I is - capital i for Ice , the second one is lower case l , l for love )
    - submit this interactive job
    - you will get a terminal , cat $PBS_NODEFILE
    - with hostname list try ssh , check whether it works both ways

Thanks again for the info.
Now the scripts are working by checking all the items you mentioned above.

Still have concerns for the firewall port allow list:
I have saw the log info such like

01/15/2019 13:57:00;0001;pbs_mom;Svr;pbs_mom;Connection timed out (110) in open_demux, open_demux: connect
01/15/2019 14:17:55;0001;pbs_mom;Svr;pbs_mom;Connection timed out (110) in open_demux, open_demux: connect

So what’s the port number such like 58649/45144 for? we would like to add some ports to the firewall allow list instead of disabling firewall. Could you please give us some suggestions?

Thanks a lot!

1 Like

For PBS the required ports are : 15001, 15002, 15003, 15004 , 15007, 17001
Please refer: Table 4-1: Ports Used by PBS Daemons in TPP Mode from the below guide , regarding ports.

1 Like

Thanks so much. No further questions. Very appreciate your help!