Installing OpenPBS on Ubuntu 24.04 - my experience

I just successfully installed OpenPBS on a cluster of Ubuntu 24.04 servers. It was not easy or straightforward, and I want to document my process so that it might help others trying to do the same thing. I am not sure this is the right process, but it worked for me. I am planning to submit some of my fixes as a Pull Request in hopes that they will get fixed, and in the future you won’t need to follow all these steps. At the time of writing, the last release of OpenPBS is v23.06.06. That release does not support Ubuntu 24, so I needed to get the code from the master branch on github.com. My setup is that I have one head node with dual NICs, and 20 compute nodes on the private network.

Step 1: Install the head node and compute nodes. Complete this step on all the servers

Install all the deps

sudo apt install -y gcc make libtool libhwloc-dev libx11-dev
libxt-dev libedit-dev libical-dev ncurses-dev perl
postgresql-server-dev-all postgresql-contrib python3-dev tcl-dev tk-dev swig
libexpat-dev libssl-dev libxext-dev libxft-dev autoconf
automake g++ libcjson-dev expat libedit2 postgresql python3 postgresql-contrib sendmail-bin
tcl tk libical3 postgresql-server-dev-all

Get OpenPBS

wget https://github.com/openpbs/openpbs/archive/refs/heads/master.tar.gz
tar -zxvf master.tar.gz
cd openpbs-master/
./autogen.sh
./configure --prefix=/opt/pbs
make
sudo make install

There are some paths in the habitat and database script that assume paths that don’t exists, so we need to

Fix the paths so script will run on Ubuntu24

sudo mkdir /usr/pgsql-16.6
sudo ln -s /usr/lib/postgresql/16/lib/ /usr/pgsql-16.6/lib
sudo ln -s /usr/share/postgresql/16/ /usr/pgsql-16.6/share
sudo ln -s /usr/lib/postgresql/16/bin/pg_resetwal /usr/lib/postgresql/16/bin/pg_resetxlog

Next, one of the scripts errors with a syntax error. We have to fix that use ‘vim’ or ‘nano’ to open “/opt/pbs/libexec/pbs_db_utility” (I will use ‘vim’ hereafter) to change the very first line of the script

before: #!/bin/sh

after: #!/bin/bash

sudo vim /opt/pbs/libexec/pbs_db_utility

Next we have to change the permissions of a couple of scripts so they work correctly

sudo chmod 4755 /opt/pbs/sbin/pbs_iff
sudo chmod 4755 /opt/pbs/sbin/pbs_rcp

Next, we have to make sure our /etc/hosts file contains all the entries for all servers, head and compute

Mine looks like this:

10.1.0.1 head
10.1.0.100 compute-1
10.1.0.101 compute-2

10.1.0.119 compute-20

make sure there are no duplicate entries for any of the hostnames

Now we are ready to run the OpenPBS post install script:

sudo /opt/pbs/libexec/pbs_postinstall

Now we need to fix the /etc/pbs.conf configuration file.

#This will be different on the head node versus the compute nodes
sudo vim /etc/pbs.conf

On all nodes, PBS_SERVER= needs to point to the head node

On the head node, set these lines:

PBS_START_SERVER=1
PBS_START_SCHED=1
PBS_START_COMM=1
PBS_START_MOM=0

On the compute nodes, set these lines:

PBS_START_SERVER=0
PBS_START_SCHED=0
PBS_START_COMM=0
PBS_START_MOM=1

Now we are ready to start up the server. Use the following command:

sudo /etc/init.d/pbs start

Now check that the processes are running:

ps axf | grep pbs

On the head node, you should see

/opt/pbs/sbin/pbs_comm
/opt/pbs/sbin/pbs_sched
/opt/pbs/sbin/pbs_ds_monitor
/opt/pbs/sbin/pbs_server.bin

as well as some postgres processes

On the compute nodes, you should only see

/opt/pbs/sbin/pbs_mom

Step 2: Register the compute nodes

Once you have installed the software on the head node and all the compute nodes, you need to register the compute nodes on the head node.

For each of the compute nodes, you need to run the follow commands on the head node. Change “compute-x” to the hostname of the compute node. This must match what is in the /etc/hosts file on every server. For the “resources_available.mem=XXXXXX” part, change the XXXX’s to the output from ‘free’ (kb of total memory). If you don’t know the number of cpus on a node, “sudo apt install mdm” and then use “ncpus”. Put that number in for the CCC’s in “resources_available.ncpus=CCC”

sudo /opt/pbs/bin/qmgr -c “create node compute-x”
sudo /opt/pbs/bin/qmgr -c “set node compute-x resources_available.ncpus=CCC,resources_available.mem=XXXXXX”
sudo /opt/pbs/bin/pbsnodes -r compute-x

Finally, check that all the nodes are registered and ready with

/opt/pbs/bin/pbsnodes -a

Particularly, make sure each node has “state = free”

1 Like

Hi, I am retarted or something but I am unable to run OpenPBS on Ubuntu 24. The problem is to run pbs_server. I am getting the same error on a real server and a virtual machine (shown here, clean installation). After following your steps, I am getting stuck after trying to run services. Virtual machine testhlavicka is server and mom at the same time, etc/hosts is set correctly to its IP address 192.168.149.130

Blockquote
sedlar@testhlavicka:/opt/pbs/sbin$ sudo /etc/init.d/pbs status pbs_server is not running
pbs_mom is pid 6879
pbs_sched is pid 6891
pbs_comm is 6869

Here is the log from /var/spool/pbs/server_logs

Blockquote
01/22/2025 12:41:58;0002;Server@testhlavicka;Svr;Log;Log opened
01/22/2025 12:41:58;0002;Server@testhlavicka;Svr;Server@testhlavicka;pbs_version=23.06.06
01/22/2025 12:41:58;0002;Server@testhlavicka;Svr;Server@testhlavicka;pbs_build=mach=N/A:security=N/A:configure_args=N/A
01/22/2025 12:41:58;0002;Server@testhlavicka;Svr;Server@testhlavicka;hostname=testhlavicka;pbs_leaf_name=N/A;pbs_mom_node_name=N/A
01/22/2025 12:41:58;0002;Server@testhlavicka;Svr;Server@testhlavicka;ipv4 interface lo: ip6-loopback
01/22/2025 12:41:58;0002;Server@testhlavicka;Svr;Server@testhlavicka;ipv4 interface ens33: testhlavicka
01/22/2025 12:41:58;0002;Server@testhlavicka;Svr;Server@testhlavicka;ipv4 interface docker0: testhlavicka
01/22/2025 12:41:58;0002;Server@testhlavicka;Svr;Server@testhlavicka;ipv6 interface lo: ip6-loopback
01/22/2025 12:41:58;0002;Server@testhlavicka;Svr;Server@testhlavicka;ipv6 interface ens33: testhlavicka
01/22/2025 12:41:58;0006;Server@testhlavicka;Fil;Server@testhlavicka;Version 23.06.06, started, initialization type = 1
01/22/2025 12:41:58;0002;Server@testhlavicka;Svr;Server@testhlavicka;pbs_status_db exit code 1
01/22/2025 12:41:58;0002;Server@testhlavicka;Svr;Server@testhlavicka;Starting PBS dataservice
01/22/2025 12:42:01;0002;Server@testhlavicka;Svr;Server@testhlavicka;connected to PBS dataservice@testhlavicka
01/22/2025 12:42:01;0002;Server@testhlavicka;n/a;setup_env;read environment from /var/spool/pbs/pbs_environment
01/22/2025 12:42:01;0d80;Server@testhlavicka;TPP;Server@testhlavicka(Main Thread);TPP authentication method = resvport
01/22/2025 12:42:01;0c06;Server@testhlavicka;TPP;Server@testhlavicka(Main Thread);TPP leaf node names = 192.168.149.130:15001,127.0.0.1:15001,192.168.149.130:15001,172.17.0.1:15001
01/22/2025 12:42:01;0d80;Server@testhlavicka;TPP;Server@testhlavicka(Main Thread);Initializing TPP transport Layer
01/22/2025 12:42:01;0d80;Server@testhlavicka;TPP;Server@testhlavicka(Main Thread);Max files allowed = 16384
01/22/2025 12:42:01;0d80;Server@testhlavicka;TPP;Server@testhlavicka(Main Thread);TPP initialization done
01/22/2025 12:42:01;0d80;Server@testhlavicka;TPP;Server@testhlavicka(Main Thread);Connecting to pbs_comm testhlavicka:17001
01/22/2025 12:42:01;0c06;Server@testhlavicka;TPP;Server@testhlavicka(Thread 0);Thread ready
01/22/2025 12:42:01;0c06;Server@testhlavicka;TPP;Server@testhlavicka(Thread 0);Registering address 192.168.149.130:15001 to pbs_comm testhlavicka:17001
01/22/2025 12:42:01;0c06;Server@testhlavicka;TPP;Server@testhlavicka(Thread 0);Registering address 172.17.0.1:15001 to pbs_comm testhlavicka:17001
01/22/2025 12:42:01;0c06;Server@testhlavicka;TPP;Server@testhlavicka(Thread 0);Connected to pbs_comm testhlavicka:17001
01/22/2025 12:42:01;0000;Server@testhlavicka;Svr;Server@testhlavicka;Supported authentication method: resvport
01/22/2025 12:42:01;0002;Server@testhlavicka;Svr;Server@testhlavicka;Stopping PBS dataservice

Maybe some trouble with PostgreSQL, looking into var/spool/pbs/datastore/log/pbs_dataservice_log.Wed

Blockquote
2025-01-22 12:41:58.208 UTC [6013] LOG: starting PostgreSQL 16.6 (Ubuntu 16.6-0ubuntu0.24.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 13.2.0-23ubuntu4) 13.2.0, 64-bit
2025-01-22 12:41:58.208 UTC [6013] LOG: listening on IPv4 address “0.0.0.0”, port 15007
2025-01-22 12:41:58.208 UTC [6013] LOG: listening on IPv6 address “::”, port 15007
2025-01-22 12:41:58.210 UTC [6013] LOG: listening on Unix socket “/var/run/postgresql/.s.PGSQL.15007”
2025-01-22 12:41:58.217 UTC [6026] LOG: database system was shut down at 2025-01-22 09:27:42 UTC
2025-01-22 12:41:58.228 UTC [6013] LOG: database system is ready to accept connections
2025-01-22 12:42:01.404 UTC [6013] LOG: received fast shutdown request
2025-01-22 12:42:01.405 UTC [6013] LOG: aborting any active transactions
2025-01-22 12:42:01.410 UTC [6013] LOG: background worker “logical replication launcher” (PID 6029) exited with exit code 1
2025-01-22 12:42:01.411 UTC [6024] LOG: shutting down
2025-01-22 12:42:01.412 UTC [6024] LOG: checkpoint starting: shutdown immediate
2025-01-22 12:42:01.417 UTC [6024] LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.001 s, total=0.007 s; sync files=2, longest=0.001 s, average=0.001 s; distance=0 kB, estimate=0 kB; lsn=0/19B4EE8, redo lsn=0/19B4EE8
2025-01-22 12:42:01.425 UTC [6013] LOG: database system is shut down

This log contains some error after installing, could that be a problem?

Blockquote
2025-01-22 08:25:44.220 UTC [41852] LOG: starting PostgreSQL 16.6 (Ubuntu 16.6-0ubuntu0.24.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 13.2.0-23ubuntu4) 13.2.0, 64-bit
2025-01-22 08:25:44.220 UTC [41852] LOG: listening on IPv4 address “0.0.0.0”, port 15007
2025-01-22 08:25:44.220 UTC [41852] LOG: listening on IPv6 address “::”, port 15007
2025-01-22 08:25:44.222 UTC [41852] LOG: listening on Unix socket “/var/run/postgresql/.s.PGSQL.15007”
2025-01-22 08:25:44.228 UTC [41866] LOG: database system was shut down at 2025-01-22 08:25:43 UTC
2025-01-22 08:25:44.235 UTC [41852] LOG: database system is ready to accept connections
2025-01-22 08:25:45.523 UTC [41937] ERROR: schema “pbs” does not exist
2025-01-22 08:25:45.523 UTC [41937] STATEMENT: /*
* Copyright …

Does anybody has any ideas, please?

Can you confirm that you made the change to /opt/pbs/libexec/pbs_db_utility?
The first line of that file must be #!/bin/bash.
After that, you will need to re-run sudo /opt/pbs/libexec/pbs_postinstall. Please let me know if that solves the issue.

Dear Brian,

I followed your instructions completely (which was the problem) including shebang #!/bin/bash, which in fact, makes no difference in clean OS because #!/bin/sh in Ubuntu should be just a symbolic link to system’s shell which is naturally bash for Ubuntu. So for me, the correct installation was pretty straightforward following official instructions in the INSTALL file:

sudo apt install gcc make libtool libhwloc-dev libx11-dev
libxt-dev libedit-dev libical-dev ncurses-dev perl
postgresql-server-dev-all postgresql-contrib python3-dev tcl-dev tk-dev swig
libexpat-dev libssl-dev libxext-dev libxft-dev autoconf
automake g++ libcjson-dev

sudo apt install expat libedit2 postgresql python3 postgresql-contrib sendmail-bin
sudo tcl tk libical3 postgresql-server-dev-all

wget https://github.com/openpbs/openpbs/archive/refs/heads/master.tar.gz
tar -zxvf master.tar.gz
cd openpbs-master/
./autogen.sh
./configure --prefix=/opt/pbs
make
sudo make install

sudo /opt/pbs/libexec/pbs_postinstall

edit files: /etc/pbs.conf and /etc/hosts

sudo chmod 4755 /opt/pbs/sbin/pbs_iff /opt/pbs/sbin/pbs_rcp

and start pbs services…

Hi Kadilik,

A couple of things, while it used to be true that /bin/sh is a symlink to bash, not in U24:

$ ls -l /bin/sh
lrwxrwxrwx 1 root root 4 Mar 31  2024 /bin/sh -> dash

Additionally, it’s easy to make an example that breaks in the exact way the PBS script breaks:
test_sh.sh

#!/bin/sh
file="*"
[[ -e $file ]]

Run it with /bin/sh:

$ ./test_sh.sh 
./test_sh.sh: 3: [[: not found

Run it with bash:
$ bash ./test_sh.sh .
That is why I included that fix in the instructions. I got the exact data service error you reported, and was able to fix it by editing that one file.

You are right! Yet, pbs work correctly for me with shebang #!/bin/sh in pbs_db_utility. Since it’s working, I don’t need to know why :smiley: