Installing OpenPBS on Ubuntu 24.04 - my experience

I just successfully installed OpenPBS on a cluster of Ubuntu 24.04 servers. It was not easy or straightforward, and I want to document my process so that it might help others trying to do the same thing. I am not sure this is the right process, but it worked for me. I am planning to submit some of my fixes as a Pull Request in hopes that they will get fixed, and in the future you won’t need to follow all these steps. At the time of writing, the last release of OpenPBS is v23.06.06. That release does not support Ubuntu 24, so I needed to get the code from the master branch on github.com. My setup is that I have one head node with dual NICs, and 20 compute nodes on the private network.

Step 1: Install the head node and compute nodes. Complete this step on all the servers

Install all the deps

sudo apt install -y gcc make libtool libhwloc-dev libx11-dev
libxt-dev libedit-dev libical-dev ncurses-dev perl
postgresql-server-dev-all postgresql-contrib python3-dev tcl-dev tk-dev swig
libexpat-dev libssl-dev libxext-dev libxft-dev autoconf
automake g++ libcjson-dev expat libedit2 postgresql python3 postgresql-contrib sendmail-bin
tcl tk libical3 postgresql-server-dev-all

Get OpenPBS

wget https://github.com/openpbs/openpbs/archive/refs/heads/master.tar.gz
tar -zxvf master.tar.gz
cd openpbs-master/
./autogen.sh
./configure --prefix=/opt/pbs
make
sudo make install

There are some paths in the habitat and database script that assume paths that don’t exists, so we need to

Fix the paths so script will run on Ubuntu24

sudo mkdir /usr/pgsql-16.6
sudo ln -s /usr/lib/postgresql/16/lib/ /usr/pgsql-16.6/lib
sudo ln -s /usr/share/postgresql/16/ /usr/pgsql-16.6/share
sudo ln -s /usr/lib/postgresql/16/bin/pg_resetwal /usr/lib/postgresql/16/bin/pg_resetxlog

Next, one of the scripts errors with a syntax error. We have to fix that use ‘vim’ or ‘nano’ to open “/opt/pbs/libexec/pbs_db_utility” (I will use ‘vim’ hereafter) to change the very first line of the script

before: #!/bin/sh

after: #!/bin/bash

sudo vim /opt/pbs/libexec/pbs_db_utility

Next we have to change the permissions of a couple of scripts so they work correctly

sudo chmod 4755 /opt/pbs/sbin/pbs_iff
sudo chmod 4755 /opt/pbs/sbin/pbs_rcp

Next, we have to make sure our /etc/hosts file contains all the entries for all servers, head and compute

Mine looks like this:

10.1.0.1 head
10.1.0.100 compute-1
10.1.0.101 compute-2

10.1.0.119 compute-20

make sure there are no duplicate entries for any of the hostnames

Now we are ready to run the OpenPBS post install script:

sudo /opt/pbs/libexec/pbs_postinstall

Now we need to fix the /etc/pbs.conf configuration file.

#This will be different on the head node versus the compute nodes
sudo vim /etc/pbs.conf

On all nodes, PBS_SERVER= needs to point to the head node

On the head node, set these lines:

PBS_START_SERVER=1
PBS_START_SCHED=1
PBS_START_COMM=1
PBS_START_MOM=0

On the compute nodes, set these lines:

PBS_START_SERVER=0
PBS_START_SCHED=0
PBS_START_COMM=0
PBS_START_MOM=1

Now we are ready to start up the server. Use the following command:

sudo /etc/init.d/pbs start

Now check that the processes are running:

ps axf | grep pbs

On the head node, you should see

/opt/pbs/sbin/pbs_comm
/opt/pbs/sbin/pbs_sched
/opt/pbs/sbin/pbs_ds_monitor
/opt/pbs/sbin/pbs_server.bin

as well as some postgres processes

On the compute nodes, you should only see

/opt/pbs/sbin/pbs_mom

Step 2: Register the compute nodes

Once you have installed the software on the head node and all the compute nodes, you need to register the compute nodes on the head node.

For each of the compute nodes, you need to run the follow commands on the head node. Change “compute-x” to the hostname of the compute node. This must match what is in the /etc/hosts file on every server. For the “resources_available.mem=XXXXXX” part, change the XXXX’s to the output from ‘free’ (kb of total memory). If you don’t know the number of cpus on a node, “sudo apt install mdm” and then use “ncpus”. Put that number in for the CCC’s in “resources_available.ncpus=CCC”

sudo /opt/pbs/bin/qmgr -c “create node compute-x”
sudo /opt/pbs/bin/qmgr -c “set node compute-x resources_available.ncpus=CCC,resources_available.mem=XXXXXX”
sudo /opt/pbs/bin/pbsnodes -r compute-x

Finally, check that all the nodes are registered and ready with

/opt/pbs/bin/pbsnodes -a

Particularly, make sure each node has “state = free”

1 Like