Upgrade fails from 20.0.1 to 22.05.11

Dear all,

while preparing for an overlay upgrade from OpenPBS version 20.0.1 to 22.05.11 I ran into an issue while starting the pbs service on the PBS server.
This is a blank dev machine, so no job submissions were hurt during this test. Nontheless, I try to replicate a running setup which I want to upgrade the same path. Here’s the output:

11/28/2022 15:47:40;0002;Server@pbshead1;Svr;Log;Log opened
11/28/2022 15:47:40;0002;Server@pbshead1;Svr;Server@pbshead1;pbs_version=22.05.11
11/28/2022 15:47:40;0002;Server@pbshead1;Svr;Server@pbshead1;pbs_build=mach=N/A:security=N/A:configure_args=N/A
11/28/2022 15:47:40;0002;Server@pbshead1;Svr;Server@pbshead1;hostname=pbshead1.dev.tld.local;pbs_leaf_name=N/A;pbs_mom_node_name=N/A
11/28/2022 15:47:40;0002;Server@pbshead1;Svr;Server@pbshead1;ipv4 interface lo: localhost
11/28/2022 15:47:40;0002;Server@pbshead1;Svr;Server@pbshead1;ipv4 interface ens18: pbshead1.dev.tld.local
11/28/2022 15:47:40;0002;Server@pbshead1;Svr;Server@pbshead1;ipv6 interface lo: ip6-loopback
11/28/2022 15:47:40;0002;Server@pbshead1;Svr;Server@pbshead1;ipv6 interface ens18: pbshead1.dev.tld.local
11/28/2022 15:47:40;0002;Server@pbshead1;Svr;Server@pbshead1;ipv6 interface ens18: pbshead1.dev.tld.local
11/28/2022 15:47:40;0002;Server@pbshead1;Svr;Server@pbshead1;ipv6 interface ens18: pbshead1.dev.tld.local
11/28/2022 15:47:40;0006;Server@pbshead1;Fil;Server@pbshead1;Version 22.05.11, started, initialization type = 1
11/28/2022 15:47:41;0002;Server@pbshead1;Svr;Server@pbshead1;pbs_status_db exit code 1
11/28/2022 15:47:41;0002;Server@pbshead1;Svr;Server@pbshead1;Starting PBS dataservice
11/28/2022 15:47:43;0002;Server@pbshead1;Svr;Server@pbshead1;Prepare of statement insert_job failed: ERROR:  column "ji_jid" of relation "job" does not exist
LINE 1: ...at,ji_quetime,ji_rteretry,ji_fromsock,ji_fromaddr,ji_jid,ji_...
                                                             ^ 42703
11/28/2022 15:47:44;0002;Server@pbshead1;Svr;Server@pbshead1;Starting PBS dataservice
11/28/2022 15:47:46;0002;Server@pbshead1;Svr;Server@pbshead1;pbs_status_db exit code 0
11/28/2022 15:47:46;0002;Server@pbshead1;Svr;Server@pbshead1;Prepare of statement insert_job failed: ERROR:  column "ji_jid" of relation "job" does not exist
LINE 1: ...at,ji_quetime,ji_rteretry,ji_fromsock,ji_fromaddr,ji_jid,ji_...
                                                             ^ 42703

(continues with infinite restart attempts)

The server (headnode) is an Ubuntu 18.04 and does not run an execution node (MoM) itself. Two MoMs are connected to the headnode. OpenPBS versions are self-compiled.

I followed the PBS Installation Guide up to chapter 6.5.17.2, that’s where it fails. I also found a related issue report in the forum, but although the pull requests are merged, the issue seems to be the same.

Postgres version:

root@pbshead1:~# /usr/lib/postgresql/10/bin/postgres -V
postgres (PostgreSQL) 10.22 (Ubuntu 10.22-0ubuntu0.18.04.1)

I can provide more logs and information if wanted. Any help appreciated!

As a first guess, I would say the pbs_schema_upgrade script did not run, or ran but failed. Because this is a test system, you can run it manually to see if that helps.

The real question, though, is why the script failed.

thank you for your answer, @dtalcott!

Indeed, this seems to be the problem. After invoking pbs_schema_upgrade with the correct environment variables, I get this far:

root@pbshead1:~# PBS_EXEC=/opt/pbs PBS_DATA_SERVICE_PORT=15007 PBS_DATA_SERVICE_USER=postgres /opt/pbs/libexec/pbs_schema_upgrade
/opt/pbs/sbin/pbs_ds_systemd: 43: [: xrunning: unexpected operator
Password for user postgres:
psql: fe_sendauth: no password supplied
Cannot upgrade PBS datastore version

or, with bash -x:

output
root@pbshead1:~# PBS_EXEC=/opt/pbs PBS_DATA_SERVICE_PORT=15007 PBS_DATA_SERVICE_USER=postgres bash -x /opt/pbs/libexec/pbs_schema_upgrade
+ . /opt/pbs/libexec/pbs_db_env
++ PGSQL_LIBSTR=
++ '[' -z /opt/pbs ']'
++ '[' -d /opt/pbs/pgsql ']'
+++ type psql
+++ cut '-d ' -f3
++ PGSQL_CMD=/usr/bin/psql
++ '[' -z /usr/bin/psql ']'
+++ type pg_config
+++ cut '-d ' -f3
++ PGSQL_CONF=/usr/bin/pg_config
++ '[' -z /usr/bin/pg_config ']'
+++ /usr/bin/pg_config
+++ awk '/BINDIR/{ print $3 }'
++ PGSQL_BIN=/usr/lib/postgresql/10/bin
+++ dirname /usr/lib/postgresql/10/bin
++ PGSQL_DIR=/usr/lib/postgresql/10
++ '[' /usr/lib/postgresql/10 = / ']'
++ export PGSQL_BIN=/usr/lib/postgresql/10/bin
++ PGSQL_BIN=/usr/lib/postgresql/10/bin
++ '[' -d /opt/pbs/lib ']'
++ LD_LIBRARY_PATH=/opt/pbs/lib:
++ export LD_LIBRARY_PATH
+ tmpdir=/var/tmp
+ PBS_CURRENT_SCHEMA_VER=1.5.0
+ outfile=/var/tmp/pbs_dataservice_output_5176
+ /opt/pbs/sbin/pbs_dataservice status
+ '[' 1 -eq 0 ']'
+ ret=0
+ '[' 0 -ne 0 ']'
+ /opt/pbs/sbin/pbs_dataservice start
/opt/pbs/sbin/pbs_ds_systemd: 43: [: xrunning: unexpected operator
+ '[' 0 -ne 0 ']'
+ rm -f /var/tmp/pbs_dataservice_output_5176
++ /usr/lib/postgresql/10/bin/psql -A -t -p 15007 -d pbs_datastore -U postgres -c 'select pbs_schema_version from pbs.info'
Password for user postgres:
psql: fe_sendauth: no password supplied
+ ver=
+ '[' '' = 1.5.0 ']'
+ '[' '' = 1.0.0 ']'
+ /opt/pbs/sbin/pbs_dataservice status
+ '[' 0 -eq 1 ']'
+ '[' '' = 1.1.0 ']'
+ '[' '' = 1.2.0 ']'
+ '[' '' = 1.3.0 ']'
+ '[' '' = 1.4.0 ']'
+ echo 'Cannot upgrade PBS datastore version '
Cannot upgrade PBS datastore version
+ ret=0
+ '[' 0 -ne 0 ']'
+ exit 1

I found out that the “Data Service Account” name must be postgres, since there is no system user pbsdata, and also no DB user pbsdata. Actually, postgres is the only user in the Postgres database.

Against all best practice, I disabled password authentication for all database users, so that the pbs_schema_upgrade does not ask for credentials.

/etc/postgresql/10/main/pg_hba.conf:

- local   all             postgres                                peer
+ local   all             postgres                                trust

This lets pbs_schema_upgrade run successfully, and the pbs service will start successfully. /var/spool/pbs/server_logs/ does not show any errors either!

root@pbshead1:~# /opt/pbs/libexec/pbs_init.d --version
pbs_version = 22.05.11

I am missing a piece of information here, though, how the upgrade is supposed to work:

  1. I compiled and installed OpenPBS without creating a pbsdata account (my bad, this is mentioned in the installation docs)
  2. The installation routine picked user postgres, which seems to work fine.
  3. The installation routine generated a random password for the Data Store User and stores it in $PBS_HOME/server_priv/db_password, different and independant of the system user’s password.
  4. User root can su into user postgres
  5. Running pbs_schema_upgrade still asks for password of user postgres here: (taken from output above)
++ /usr/lib/postgresql/10/bin/psql -A -t -p 15007 -d pbs_datastore -U postgres -c 'select pbs_schema_version from pbs.info'
Password for user postgres:
  1. I don’t know this password, since it is a random-generated password, stored encrypted.

Is pbs_schema_upgrade not intended to be called on it’s own, but only through a parent process where it get’s passwords from?

Ugh! Too much mess to fix reliably. And, you are right, you should not need to make the database wide-open.

You say this is a test system. In that case, I would wipe it out and re-install from scratch, with OpenPBS 20.0.1. Make sure that works (run a job). Make sure that the pbsdata user exists and owns the files at /var/spool/pbs/datastore. Make a backup copy of that directory in case you need to revert and try again.

Now, you should be ready to go through the steps to overlay upgrade to 22.05.11.

Run the upgrade steps under the “script” command so that you can review exactly what you typed and what the commands output.

same error message with an existing pbsdata Linux user account. Also created a pbsadmin account since that’s also part of the recommended steps in the manual. Job execution is working (tested with stdin to qsub), the /var/spool/pbs/datastore directory exists, has correct ownership set and is populated.

Sidenote: The merely existing user was not recognized when installing PBS following the INSTALL instructions, files in /var/spool/pbs/datastore were still owned by postgres. Had to use configure --with-database-user=pbsdata argument. User pbsdata also needs to be member of group postgres, I could not find this information reflected anywhere in the docs. Might be me who is blind here, though.

Anyhow, I’m back to the same error message as before:

  • /var/spool/pbs/server_logs/20221205:
Summary
12/05/2022 15:41:31;0002;Server@pbshead1;Svr;Server@pbshead1;Starting PBS dataservice
12/05/2022 15:41:34;0002;Server@pbshead1;Svr;Server@pbshead1;pbs_status_db exit code 0
12/05/2022 15:41:34;0002;Server@pbshead1;Svr;Server@pbshead1;Prepare of statement insert_job failed: ERROR:  column "ji_jid" of relation "job" does not exist
LINE 1: ...at,ji_quetime,ji_rteretry,ji_fromsock,ji_fromaddr,ji_jid,ji_...
                                                             ^ 42703
12/05/2022 15:41:44;0002;Server@pbshead1;Svr;Server@pbshead1;Starting PBS dataservice
12/05/2022 15:41:46;0002;Server@pbshead1;Svr;Server@pbshead1;pbs_status_db exit code 0
12/05/2022 15:41:46;0002;Server@pbshead1;Svr;Server@pbshead1;Prepare of statement insert_job failed: ERROR:  column "ji_jid" of relation "job" does not exist
LINE 1: ...at,ji_quetime,ji_rteretry,ji_fromsock,ji_fromaddr,ji_jid,ji_...
                                                             ^ 42703
12/05/2022 15:41:56;0002;Server@pbshead1;Svr;Server@pbshead1;Starting PBS dataservice
12/05/2022 15:41:58;0002;Server@pbshead1;Svr;Server@pbshead1;pbs_status_db exit code 0
12/05/2022 15:41:58;0002;Server@pbshead1;Svr;Server@pbshead1;Prepare of statement insert_job failed: ERROR:  column "ji_jid" of relation "job" does not exist
LINE 1: ...at,ji_quetime,ji_rteretry,ji_fromsock,ji_fromaddr,ji_jid,ji_...
                                                             ^ 42703
[...]
  • journalctl -fu pbs:
Summary
Dec 05 15:38:56 pbshead1.dev.example.local systemd[1]: Starting Portable Batch System...
Dec 05 15:38:56 pbshead1.dev.example.local pbs_init.d[23428]: Starting PBS
Dec 05 15:38:56 pbshead1.dev.example.local pbs_init.d[23428]: PBS Home directory /var/spool/pbs needs updating.
Dec 05 15:38:56 pbshead1.dev.example.local pbs_init.d[23428]: Running /opt/pbs/libexec/pbs_habitat to update it.
Dec 05 15:38:56 pbshead1.dev.example.local pbs_init.d[23428]: ***
Dec 05 15:38:56 pbshead1.dev.example.local su[23519]: Successful su for pbsdata by root
Dec 05 15:38:56 pbshead1.dev.example.local su[23519]: + ??? root:pbsdata
Dec 05 15:38:56 pbshead1.dev.example.local su[23519]: pam_unix(su:session): session opened for user pbsdata by (uid=0)
Dec 05 15:38:56 pbshead1.dev.example.local su[23519]: pam_unix(su:session): session closed for user pbsdata
Dec 05 15:38:56 pbshead1.dev.example.local pbs_init.d[23428]: Data service directory from previous PBS installation not found,
Dec 05 15:38:56 pbshead1.dev.example.local pbs_init.d[23428]: Datastore upgrade cannot continue
Dec 05 15:38:56 pbshead1.dev.example.local pbs_init.d[23428]: Failed to upgrade PBS Datastore
Dec 05 15:38:56 pbshead1.dev.example.local pbs_init.d[23428]: *** End of /opt/pbs/libexec/pbs_habitat
Dec 05 15:38:56 pbshead1.dev.example.local pbs_init.d[23428]: Home directory /var/spool/pbs updated.
Dec 05 15:38:57 pbshead1.dev.example.local pbs_init.d[23428]: /opt/pbs/sbin/pbs_comm ready (pid=23576), Proxy Name:pbshead1.dev.example.local:17001, Threads:4
Dec 05 15:38:57 pbshead1.dev.example.local pbs_init.d[23428]: PBS comm
Dec 05 15:38:57 pbshead1.dev.example.local pbs_init.d[23428]: PBS sched
Dec 05 15:38:57 pbshead1.dev.example.local su[23612]: Successful su for pbsdata by root
Dec 05 15:38:57 pbshead1.dev.example.local su[23612]: + ??? root:pbsdata
Dec 05 15:38:57 pbshead1.dev.example.local su[23612]: pam_unix(su:session): session opened for user pbsdata by (uid=0)
Dec 05 15:38:57 pbshead1.dev.example.local su[23612]: pam_unix(su:session): session closed for user pbsdata
Dec 05 15:38:57 pbshead1.dev.example.local su[23645]: Successful su for pbsdata by root
Dec 05 15:38:57 pbshead1.dev.example.local su[23645]: + ??? root:pbsdata
Dec 05 15:38:57 pbshead1.dev.example.local su[23645]: pam_unix(su:session): session opened for user pbsdata by (uid=0)
Dec 05 15:38:57 pbshead1.dev.example.local su[23645]: pam_unix(su:session): session closed for user pbsdata
Dec 05 15:38:57 pbshead1.dev.example.local pbs_init.d[23428]: /opt/pbs/sbin/pbs_ds_systemd: 43: [: xrunning: unexpected operator
Dec 05 15:38:59 pbshead1.dev.example.local su[23694]: Successful su for pbsdata by root
Dec 05 15:38:59 pbshead1.dev.example.local su[23694]: + ??? root:pbsdata
Dec 05 15:38:59 pbshead1.dev.example.local su[23694]: pam_unix(su:session): session opened for user pbsdata by (uid=0)
Dec 05 15:38:59 pbshead1.dev.example.local su[23694]: pam_unix(su:session): session closed for user pbsdata
Dec 05 15:38:59 pbshead1.dev.example.local su[23709]: Successful su for pbsdata by root
Dec 05 15:38:59 pbshead1.dev.example.local su[23709]: + ??? root:pbsdata
Dec 05 15:38:59 pbshead1.dev.example.local su[23709]: pam_unix(su:session): session opened for user pbsdata by (uid=0)
Dec 05 15:38:59 pbshead1.dev.example.local su[23709]: pam_unix(su:session): session closed for user pbsdata
Dec 05 15:39:00 pbshead1.dev.example.local su[23727]: Successful su for pbsdata by root
Dec 05 15:39:00 pbshead1.dev.example.local su[23727]: + ??? root:pbsdata
Dec 05 15:39:00 pbshead1.dev.example.local su[23727]: pam_unix(su:session): session opened for user pbsdata by (uid=0)
Dec 05 15:39:00 pbshead1.dev.example.local su[23727]: pam_unix(su:session): session closed for user pbsdata
Dec 05 15:39:00 pbshead1.dev.example.local pbs_init.d[23428]: /opt/pbs/sbin/pbs_ds_systemd: 43: [: xrunning: unexpected operator
[... and some time later ...]
Dec 05 15:39:16 pbshead1.dev.example.local su[24065]: Successful su for pbsdata by root
Dec 05 15:39:16 pbshead1.dev.example.local su[24065]: + ??? root:pbsdata
Dec 05 15:39:16 pbshead1.dev.example.local su[24065]: pam_unix(su:session): session opened for user pbsdata by (uid=0)
Dec 05 15:39:16 pbshead1.dev.example.local su[24065]: pam_unix(su:session): session closed for user pbsdata
Dec 05 15:39:16 pbshead1.dev.example.local su[24079]: Successful su for pbsdata by root
Dec 05 15:39:16 pbshead1.dev.example.local su[24079]: + ??? root:pbsdata
Dec 05 15:39:16 pbshead1.dev.example.local su[24079]: pam_unix(su:session): session opened for user pbsdata by (uid=0)
Dec 05 15:39:16 pbshead1.dev.example.local su[24079]: pam_unix(su:session): session closed for user pbsdata
Dec 05 15:39:16 pbshead1.dev.example.local su[24094]: Successful su for pbsdata by root
Dec 05 15:39:16 pbshead1.dev.example.local su[24094]: + ??? root:pbsdata
Dec 05 15:39:16 pbshead1.dev.example.local su[24094]: pam_unix(su:session): session opened for user pbsdata by (uid=0)
Dec 05 15:39:16 pbshead1.dev.example.local su[24094]: pam_unix(su:session): session closed for user pbsdata
Dec 05 15:39:21 pbshead1.dev.example.local pbs_init.d[23428]: Connecting to PBS dataservice......continuing in background.
Dec 05 15:39:21 pbshead1.dev.example.local pbs_init.d[23428]: Connecting to PBS dataservice......continuing in background.
Dec 05 15:39:21 pbshead1.dev.example.local pbs_init.d[23428]: PBS server
Dec 05 15:39:21 pbshead1.dev.example.local systemd[1]: Started Portable Batch System.
Dec 05 15:39:21 pbshead1.dev.example.local su[24204]: Successful su for pbsdata by root
Dec 05 15:39:21 pbshead1.dev.example.local su[24204]: + ??? root:pbsdata
Dec 05 15:39:21 pbshead1.dev.example.local su[24204]: pam_unix(su:session): session opened for user pbsdata by (uid=0)
Dec 05 15:39:21 pbshead1.dev.example.local su[24204]: pam_unix(su:session): session closed for user pbsdata
Dec 05 15:39:23 pbshead1.dev.example.local su[24257]: Successful su for pbsdata by root
Dec 05 15:39:23 pbshead1.dev.example.local su[24257]: + ??? root:pbsdata
Dec 05 15:39:23 pbshead1.dev.example.local su[24257]: pam_unix(su:session): session opened for user pbsdata by (uid=0)
Dec 05 15:39:23 pbshead1.dev.example.local su[24257]: pam_unix(su:session): session closed for user pbsdata
Dec 05 15:39:23 pbshead1.dev.example.local su[24271]: Successful su for pbsdata by root
Dec 05 15:39:23 pbshead1.dev.example.local su[24271]: + ??? root:pbsdata
Dec 05 15:39:23 pbshead1.dev.example.local su[24271]: pam_unix(su:session): session opened for user pbsdata by (uid=0)
Dec 05 15:39:23 pbshead1.dev.example.local su[24271]: pam_unix(su:session): session closed for user pbsdata
Dec 05 15:39:23 pbshead1.dev.example.local su[24286]: Successful su for pbsdata by root
Dec 05 15:39:23 pbshead1.dev.example.local su[24286]: + ??? root:pbsdata
Dec 05 15:39:23 pbshead1.dev.example.local su[24286]: pam_unix(su:session): session opened for user pbsdata by (uid=0)
Dec 05 15:39:23 pbshead1.dev.example.local su[24286]: pam_unix(su:session): session closed for user pbsdata

Thanks in advance for any help!

What a rabbit hole! I think recent updates to OpenPBS have broken the database upgrade path.

This is what I think is happening at a high level. The startup script notices the version of PBS has changed, so invokes pbs_habitat to fix things. pbs_habitat asks pbs_db_utility to update the database. pbs_db_utility looks for the directory PBS_HOME/pgsql.forupgrade and gives up with the messages shown near the top of your journalctl log if the directory cannot be found.

It turns out that the pgsql.forupgrade directory is created by pbs_habitat itself, but only after pbs_db_utility is run.

I have a completely untested hack that you could try just to get past this issue:

index 0eecd0e0..9b0b34be 100644
--- a/src/cmds/scripts/pbs_habitat.in
+++ b/src/cmds/scripts/pbs_habitat.in
@@ -311,6 +311,7 @@ if [ "${PBS_START_SERVER:-0}" != 0 ] ; then
        export PBS_ENVIRONMENT
 
        if [ $create_new_svr_data -eq 0 ]; then
+               backup_pgsql # HACK HACK HACK
                # datastore directory already exists
                # do the database upgrade
                ${PBS_EXEC}/libexec/pbs_db_utility ${UPGRADE_DB}

No guarantees. Do not use on a production system, etc.

Nope, doesn’t work, it quits with

pbs_init.d[26065]: /opt/pbs/libexec/pbs_habitat: 281: /opt/pbs/libexec/pbs_habitat: backup_pgsql: not found
pbs_init.d[26065]: Data service directory from previous PBS installation not found,
pbs_init.d[26065]: Datastore upgrade cannot continue
pbs_init.d[26065]: Failed to upgrade PBS Datastore
pbs_init.d[26065]: *** End of /opt/pbs/libexec/pbs_habitat

Is this the exact command, or did you mean to do a database backup in place of backup_pgsql? I could not find any other reference to “backup_pgsql” anywhere else.

My error. backup_pgsql is a function defined in a newer version of pbs_habitat.

I’m working to create a local install of 22.05.11 that I can test with. In the meantime, there are a couple things you can do. First assign a new, known password to the pbsdata pgsql account. As root,

[root@server2 ~]# pbs_ds_password
Enter the password:
Re-enter the password:

---> Updated user password
---> Success

Then, print out some info from the database, again as root

[root@server2 ~]# psql --port 15007 --username pbsdata --dbname pbs_datastore
Password for user pbsdata: 
psql (9.2.24)
Type "help" for help.

pbs_datastore=# select * from pbs.info ;
 pbs_schema_version 
--------------------
 1.5.0
(1 row)

pbs_datastore=# select * from pbs.job ;
 ji_jobid | ji_state | ji_substate | ji_svrflags | ji_stime | ji_queue | ji_destin | ji_un_type | ji_exitstat | ji_quetime | ji_rteretry | ji_fromsock | ji_fromaddr | ji_jid | ji_credtype | ji_qrank | ji_savetm | ji_creattm | attr
ibutes 
----------+----------+-------------+-------------+----------+----------+-----------+------------+-------------+------------+-------------+-------------+-------------+--------+-------------+----------+-----------+------------+-----
-------
(0 rows)

pbs_datastore=# \q
[root@server2 ~]#

If you are not familiar with psql, note that each command ends with a semi-colon (;), and nothing happens until it sees the semi.

You exit with \q.

Also, check some other version info:

[root@server2 ~]# cat /var/spool/pbs/pbs_version 
20.0.1_ptl_as_script
[root@server2 ~]# qstat --version
pbs_version = 20.0.1_ptl_as_script

(I build my own versions, hence the non-standard pbs_version values.)

hey, thanks for getting back.

  • changed the password fo the datastore.
  • Output of database commands:
pbs_datastore=# select * from pbs.info ;
 pbs_schema_version
--------------------
 1.4.0
(1 row)
pbs_datastore=# select * from pbs.job ;
 ji_jobid | ji_state | ji_substate | ji_svrflags | ji_numattr | ji_ordering | ji_priority | ji_stime | ji_endtbdry | ji_queue | ji_destin | ji_un_type | ji_momaddr | ji_momport | ji_exitstat | ji_quetime | ji_rteretry | ji_fromsock | ji_fromaddr | ji_4jid | ji_4ash | ji_credtype | ji_qrank | ji_savetm | ji_creattm | attributes
----------+----------+-------------+-------------+------------+-------------+-------------+----------+-------------+----------+-----------+------------+------------+------------+-------------+------------+-------------+-------------+-------------+---------+---------+-------------+----------+-----------+------------+------------
(0 rows)
  • Output of shell commands:
root@pbshead1:~# cat /var/spool/pbs/pbs_version
20.0.1
root@pbshead1:~# qstat --version
pbs_version = 20.0.1

As I wrote earlier, this is OpenPBS compiled from source (downloaded from GitHub), following the instructions in the INSTALL file. Arguments to ./configure are:

  • PBS_VERSION=20.0.1 (for correct version number when compiling versions > 20.0.1)
  • --prefix=/opt/pbs
  • --with-database-user=pbsdata (new since earlier post in this thread)

Postgres version: psql (10.22 (Ubuntu 10.22-0ubuntu0.18.04.1))

Time to debug things one step at a time. This means running pbs_habitat manually, recording each step, to see if/where it has a problem. Assuming it will be like your earlier logs, this will be when it runs pbs_ds_utility. So, we run pbs_ds_utility manually to see what problem it runs into, etc.

Thus, as root, here is what I get. Yours should be similar, except you’ll probably get an error running pbs_ds_utility.

[root@server2 ~]# systemctl stop pbs
[root@server2 ~]# script habitat.out
Script started, file is habitat.out
[root@server2 ~]# sh -x /opt/pbs/libexec/pbs_habitat 
+ '[' 0 -eq 1 -a '' = --version ']'
+ PBS_VERSION=20.0.1
+ INSTALL_DB=install_db
+ UPGRADE_DB=upgrade_db
+ conf=/etc/pbs.conf
++ uname
+ ostype=Linux
+ umask 022
+ echo '***'
***
+ . /etc/pbs.conf
++ PBS_HOME=/var/spool/pbs
++ PBS_EXEC=/opt/pbs
++ PBS_SERVER=server2
++ PBS_START_SCHED=1
++ PBS_START_COMM=1
++ PBS_START_SERVER=1
++ PBS_START_MOM=0
++ PBS_CORE_LIMIT=unlimited
++ PBS_SCP=/usr/bin/scp
++ PBS_LOG_HIGHRES_TIMESTAMP=1
+ '[' -z /opt/pbs ']'
+ '[' '!' -d /opt/pbs ']'
+ '[' -z /var/spool/pbs ']'
++ /bin/ls -A /var/spool/pbs
[stuff]
+ '[' -x /opt/pbs/bin/qstat ']'
++ /opt/pbs/bin/qstat --version
++ sed -e 's/^.* = //'
+ pbs_version=20.0.1
+ '[' -z 20.0.1 ']'
+ '[' 20.0.1 '!=' 20.0.1 ']'
++ get_server_hostname
++ shn=
++ '[' -z '' -o -z '' ']'
++ '[' -z '' ']'
++ shn=server2
++ echo server2
++ awk '{print tolower($0)}'
+ server_hostname=server2
+ '[' server2 = change_this_to_pbs_server_hostname ']'
+ '[' -z server2 ']'
+ check_hostname server2
+ getent hosts server2
+ return 0
+ '[' 0 -ne 0 ']'
++ echo server2
++ awk -F. '{print $1}'
+ server=server2
+ '[' 1 '!=' 0 ']'
+ '[' '!' -x /opt/pbs/libexec/pbs_db_utility ']'
+ . /opt/pbs/libexec/pbs_db_env
++ PGSQL_LIBSTR=
++ '[' -z /opt/pbs ']'
++ '[' -d /opt/pbs/pgsql ']'
+++ type psql
+++ cut '-d ' -f3
++ PGSQL_CMD=/bin/psql
++ '[' -z /bin/psql ']'
+++ type pg_config
+++ cut '-d ' -f3
++ PGSQL_CONF=/bin/pg_config
++ '[' -z /bin/pg_config ']'
+++ /bin/pg_config
+++ awk '/BINDIR/{ print $3 }'
++ PGSQL_BIN=/usr/bin
+++ dirname /usr/bin
++ PGSQL_DIR=/usr
++ '[' /usr = / ']'
++ export PGSQL_BIN=/usr/bin
++ PGSQL_BIN=/usr/bin
++ '[' -d /opt/pbs/lib ']'
++ LD_LIBRARY_PATH=/opt/pbs/lib:
++ export LD_LIBRARY_PATH
+ '[' 0 -ne 0 ']'
+ PBS_licensing_loc_file=PBS_licensing_loc
+ dbuser_fl=/var/spool/pbs/server_priv/db_user
++ get_db_user
++ '[' -f /var/spool/pbs/server_priv/db_user ']'
+++ cat /var/spool/pbs/server_priv/db_user
+++ tr -d '[:space:]'
++ dbuser_name=pbsdata
++ '[' -z pbsdata ']'
++ '[' '!' -f /var/spool/pbs/server_priv/db_user ']'
++ cat /var/spool/pbs/server_priv/db_user
++ return 0
+ PBS_DATA_SERVICE_USER=pbsdata
+ '[' 0 -ne 0 ']'
+ chk_dataservice_user pbsdata
+ chk_usr=pbsdata
++ id pbsdata
+ id='uid=993(pbsdata) gid=989(pbsdata) groups=989(pbsdata)'
+ '[' 0 -ne 0 ']'
++ echo 'uid=993(pbsdata)' 'gid=989(pbsdata)' 'groups=989(pbsdata)'
++ cut -c5-
++ cut -d '(' -f1
+ id=993
+ '[' 993 = 0 ']'
+ su - pbsdata -s /bin/sh -c cd
+ '[' 0 -ne 0 ']'
+ return 0
+ '[' 0 -ne 0 ']'
+ export PBS_DATA_SERVICE_USER
+ server_started=0
+ PBS_DATA_SERVICE_PORT=15007
+ export PBS_DATA_SERVICE_PORT
+ create_new_svr_data=1
++ /opt/pbs/libexec/pbs_db_utility install_db
+ resp=
+ ret=2
+ '[' 2 -eq 2 ']'
+ create_new_svr_data=0
+ export PBS_HOME
+ export PBS_EXEC
+ export PBS_SERVER
+ export PBS_ENVIRONMENT
+ '[' 0 -eq 0 ']'
+ /opt/pbs/libexec/pbs_db_utility upgrade_db
+ '[' 0 -eq 1 ']'
+ '[' -f /var/spool/pbs/server_priv/PBS_licensing_loc ']'
+ '[' 0 '!=' 0 ']'
+ '[' 0 -eq 1 ']'
+ '[' -d /var/spool/pbs/mom_priv/jobs ']'
+ upgrade_cmd=/opt/pbs/sbin/pbs_upgrade_job
+ '[' -x /opt/pbs/sbin/pbs_upgrade_job ']'
+ total=0
+ upgraded=0
+ for file in '${PBS_HOME}/mom_priv/jobs/*.JB'
+ '[' -f '/var/spool/pbs/mom_priv/jobs/*.JB' ']'
+ '[' 0 -gt 0 ']'
+ echo 20.0.1
+ echo '*** End of /opt/pbs/libexec/pbs_habitat'
*** End of /opt/pbs/libexec/pbs_habitat
+ exit 0
[root@server2 ~]# exit
exit
Script done, file is habitat.out
[root@server2 ~]# 

Assuming you do get an error in the pbs_db_utility step, run it manually. Note the extra commands to set up its environment.

[root@server2 ~]# script dbutil.out
Script started, file is dbutil.out
[root@server2 ~]# set -a
[root@server2 ~]# source /etc/pbs.conf
[root@server2 ~]# source /opt/pbs/libexec/pbs_db_env
[root@server2 ~]# export PBS_DATA_SERVICE_USER=pbsdata
[root@server2 ~]# export PBS_DATA_SERVICE_PORT=15007
[root@server2 ~]# set +a
[root@server2 ~]# sh -x /opt/pbs/libexec/pbs_db_utility upgrade_db
+ . /etc/pbs.conf
++ PBS_HOME=/var/spool/pbs
++ PBS_EXEC=/opt/pbs
++ PBS_SERVER=server2
++ PBS_START_SCHED=1
++ PBS_START_COMM=1
++ PBS_START_SERVER=1
++ PBS_START_MOM=0
++ PBS_CORE_LIMIT=unlimited
++ PBS_SCP=/usr/bin/scp
++ PBS_LOG_HIGHRES_TIMESTAMP=1
+ trap cleanup 1 2 3 15
++ dirname /opt/pbs/libexec/pbs_db_utility
+ dir=/opt/pbs/libexec
++ pwd
+ CWD=/root
+ upgrade=0
+ PBS_AES_SWITCH_VER=14.0
+ change_locale=0
+ opt_err=1
+ opt=upgrade_db
+ '[' upgrade_db = upgrade_db ']'
+ opt_err=0
+ '[' -f /var/spool/pbs/pbs_version ']'
++ cat /var/spool/pbs/pbs_version
+ old_pbs_version=20.0.1
+ data_dir=/var/spool/pbs/datastore
+ '[' '!' -f /var/spool/pbs/datastore/PG_VERSION ']'
++ awk 'NR==1 {print $NF}'
+++ /usr/bin/postgres -V
++ cut -d . -f 1,2
++ echo postgres '(PostgreSQL)' 9.2.24
+ sys_pgsql_ver=9.2
++ cat /var/spool/pbs/datastore/PG_VERSION
+ old_pgsql_ver=9.2
+ '[' 9.2 '!=' 9.2 ']'
+ set_db_trust_login /var/spool/pbs/datastore
+ datastore_dir=/var/spool/pbs/datastore
++ cp -p /var/spool/pbs/datastore/pg_hba.conf /var/spool/pbs/datastore/pg_hba.conf.orig
+ err=
+ '[' 0 -ne 0 ']'
++ chown pbsdata /var/spool/pbs/datastore/pg_hba.conf.orig
+ err=
+ '[' 0 -ne 0 ']'
++ sed s/md5/trust/g /var/spool/pbs/datastore/pg_hba.conf
+ err=
+ '[' 0 -ne 0 ']'
++ chown pbsdata /var/spool/pbs/datastore/pg_hba.conf.new
+ err=
+ '[' 0 -ne 0 ']'
++ mv /var/spool/pbs/datastore/pg_hba.conf.new /var/spool/pbs/datastore/pg_hba.conf
+ err=
+ '[' 0 -ne 0 ']'
+ /opt/pbs/libexec/pbs_schema_upgrade 15007 pbsdata
+ ret=0
+ '[' 0 -ne 0 ']'
+ '[' 20.0.1 '<' 14.0 ']'
+ revoke_db_trust_login /var/spool/pbs/datastore
+ datastore_dir=/var/spool/pbs/datastore
++ cp -p /var/spool/pbs/datastore/pg_hba.conf.orig /var/spool/pbs/datastore/pg_hba.conf
+ err=
+ '[' 0 -eq 0 ']'
+ rm -f /var/spool/pbs/datastore/pg_hba.conf.orig
+ '[' 0 -eq 1 ']'
[root@server2 ~]# exit
exit
Script done, file is dbutil.out
[root@server2 ~]# 

We’ll see where that fails and go from there.

Here’s the full output of pbs_habitat:

Summary
root@pbshead1:~# sh -x /opt/pbs/libexec/pbs_habitat
+ [ 0 -eq 1 -a  = --version ]
+ PBS_VERSION=20.0.1
+ PBS_AES_SWITCH_VER=14.0
+ conf=/etc/pbs.conf
+ uname
+ ostype=Linux
+ umask 022
+ echo ***
***
+ . /etc/pbs.conf
+ PBS_SERVER=pbshead1.dev.example.local
+ PBS_START_SERVER=1
+ PBS_START_SCHED=1
+ PBS_START_COMM=1
+ PBS_START_MOM=0
+ PBS_EXEC=/opt/pbs
+ PBS_HOME=/var/spool/pbs
+ PBS_CORE_LIMIT=unlimited
+ PBS_SCP=/usr/bin/scp
+ [ -z /opt/pbs ]
+ [ ! -d /opt/pbs ]
+ [ -z /var/spool/pbs ]
+ /bin/ls -A /var/spool/pbs
+ [ ! -d /var/spool/pbs -o ! aux
checkpoint
comm_logs
datastore
mom_logs
mom_priv
pbs_environment
pbs_version
sched_logs
sched_priv
server_logs
server_priv
spool
undelivered ]
+ [ -f /var/spool/pbs/pbs_version ]
+ cat /var/spool/pbs/pbs_version
+ old_pbs_version=20.0.1
+ [ -x /opt/pbs/bin/qstat ]
+ /opt/pbs/bin/qstat --version
+ sed -e s/^.* = //
+ pbs_version=20.0.1
+ [ -z 20.0.1 ]
+ [ 20.0.1 != 20.0.1 ]
+ get_server_hostname
+ shn=
+ [ -z  -o -z  ]
+ [ -z  ]
+ shn=pbshead1.dev.example.local
+ echo pbshead1.dev.example.local
+ awk {print tolower($0)}
+ server_hostname=pbshead1.dev.example.local
+ [ pbshead1.dev.example.local = change_this_to_pbs_server_hostname ]
+ [ -z pbshead1.dev.example.local ]
+ check_hostname pbshead1.dev.example.local
+ getent hosts pbshead1.dev.example.local
+ return 0
+ [ 0 -ne 0 ]
+ echo+ awk -F. {print $1}
 pbshead1.dev.example.local
+ server=pbshead1
+ [ -f /var/spool/pbs/mom_priv/config ]
+ cmd=cat "$PBS_HOME/mom_priv/config"
+ cmd=cat "$PBS_HOME/mom_priv/config" | sed -e 's/\t/ /g' -e 's/ \+/ /g' -e 's/^ //' -e 's/ $//'
+ cmd=cat "$PBS_HOME/mom_priv/config" | sed -e 's/\t/ /g' -e 's/ \+/ /g' -e 's/^ //' -e 's/ $//' | grep '\$clienthost' | cut -d' ' -f2
+ eval cat "$PBS_HOME/mom_priv/config" | sed -e 's/\t/ /g' -e 's/ \+/ /g' -e 's/^ //' -e 's/ $//' | grep '\$clienthost' | cut -d' ' -f2
+ cat /var/spool/pbs/mom_priv/config
+ cut -d  -f2
+ sed -e s/\t/ /g -e s/ \+/ /g+  -e s/^ // -e s/ $//
grep \$clienthost
+ [ -z pbshead1 -o pbshead1 = CHANGE_THIS_TO_PBS_SERVER_HOSTNAME ]
+ check_hostname pbshead1
+ getent hosts pbshead1
+ return 0
+ [ 1 != 0 ]
+ . /opt/pbs/libexec/pbs_pgsql_env.sh
+ PGSQL_LIBSTR=
+ [ -d /opt/pbs/pgsql ]
+ type psql
+ cut -d  -f3
+ PGSQL_CMD=/usr/bin/psql
+ [ -z /usr/bin/psql ]
+ type pg_config
+ cut -d  -f3
+ PGSQL_CONF=/usr/bin/pg_config
+ [ -z /usr/bin/pg_config ]
+ /usr/bin/pg_config
+ awk /BINDIR/{ print $3 }
+ PGSQL_BIN=/usr/lib/postgresql/10/bin
+ dirname /usr/lib/postgresql/10/bin
+ PGSQL_DIR=/usr/lib/postgresql/10
+ [ /usr/lib/postgresql/10 = / ]
+ PBS_licensing_loc_file=PBS_licensing_loc
+ export PGSQL_LIBSTR
+ dbuser_fl=/var/spool/pbs/server_priv/db_user
+ get_db_user
+ [ -f /var/spool/pbs/server_priv/db_user ]
+ cat+ tr -d [:space:]
 /var/spool/pbs/server_priv/db_user
+ dbuser_name=pbsdata
+ [ -z pbsdata ]
+ [ ! -f /var/spool/pbs/server_priv/db_user ]
+ cat /var/spool/pbs/server_priv/db_user
+ return 0
+ PBS_DATA_SERVICE_USER=pbsdata
+ [ 0 -ne 0 ]
+ chk_dataservice_user pbsdata
+ chk_usr=pbsdata
+ id pbsdata
+ id=uid=997(pbsdata) gid=997(pbsdata) groups=997(pbsdata),114(postgres)
+ [ 0 -ne 0 ]
+ echo uid=997(pbsdata) gid=997(pbsdata) groups=997(pbsdata),114(postgres)
+ cut -c5-
+ cut -d ( -f1
+ id=997
+ [ 997 = 0 ]
+ su - pbsdata -c cd
+ [ 0 -ne 0 ]
+ return 0
+ [ 0 -ne 0 ]
+ [ ! -f /var/spool/pbs/server_priv/db_user ]
+ server_started=0
+ PBS_DATA_SERVICE_PORT=15007
+ export PBS_DATA_SERVICE_PORT
+ create_new_svr_data=1
+ [ ! -x /opt/pbs/libexec/install_db ]
+ /opt/pbs/libexec/install_db
+ resp=
+ ret=2
+ [ 2 -eq 2 ]
+ create_new_svr_data=0
+ export PBS_HOME
+ export PBS_EXEC
+ export PBS_SERVER
+ export PBS_ENVIRONMENT
+ [ 0 -eq 0 ]
+ upgrade_pbs_database
+ user=pbsdata
+ inst_dir=/usr/lib/postgresql/10
+ data_dir=/var/spool/pbs/datastore
+ [ -d /var/spool/pbs/pgsql.old ]
+ [ -d /var/spool/pbs/pgsql.forupgrade ]
+ [ ! -f /var/spool/pbs/datastore/PG_VERSION ]
+ /usr/lib/postgresql/10/bin/postgres -V
+ awk -F[ .] { print $3"."$4 }
+ echo postgres (PostgreSQL) 10.22 (Ubuntu 10.22-0ubuntu0.18.04.1)
+ sys_pgsql_ver=10.22
+ cat /var/spool/pbs/datastore/PG_VERSION
+ old_pgsql_ver=10
+ [[ ! 10 =~ . ]]
/opt/pbs/libexec/pbs_habitat: 247: /opt/pbs/libexec/pbs_habitat: [[: not found
+ [ 10 -eq 10 ]
+ [ 22 > 10 ]
+ result=0
+ [ 0 -eq 0 ]
+ [ -d /opt/pbs/pgsql ]
+ return 2
+ ret=2
+ [ 2 -ne 0 ]
+ [ 2 -eq 2 ]
+ echo It appears that PostgreSQL has been upgraded independently of PBS.
It appears that PostgreSQL has been upgraded independently of PBS.
+ echo The PBS database must be manually upgraded. Please refer to the
The PBS database must be manually upgraded. Please refer to the
+ echo documentation/release notes for details.
documentation/release notes for details.
+ exit 2

This seems to fail because line 247 contains bash-only syntax, so I ran it again with bash:

Summary
root@pbshead1:~# bash -x /opt/pbs/libexec/pbs_habitat
+ '[' 0 -eq 1 -a '' = --version ']'
+ PBS_VERSION=20.0.1
+ PBS_AES_SWITCH_VER=14.0
+ conf=/etc/pbs.conf
++ uname
+ ostype=Linux
+ umask 022
+ echo '***'
***
+ . /etc/pbs.conf
++ PBS_SERVER=pbshead1.dev.example.local
++ PBS_START_SERVER=1
++ PBS_START_SCHED=1
++ PBS_START_COMM=1
++ PBS_START_MOM=0
++ PBS_EXEC=/opt/pbs
++ PBS_HOME=/var/spool/pbs
++ PBS_CORE_LIMIT=unlimited
++ PBS_SCP=/usr/bin/scp
+ '[' -z /opt/pbs ']'
+ '[' '!' -d /opt/pbs ']'
+ '[' -z /var/spool/pbs ']'
++ /bin/ls -A /var/spool/pbs
+ '[' '!' -d /var/spool/pbs -o '!' 'aux
checkpoint
comm_logs
datastore
mom_logs
mom_priv
pbs_environment
pbs_version
sched_logs
sched_priv
server_logs
server_priv
spool
undelivered' ']'
+ '[' -f /var/spool/pbs/pbs_version ']'
++ cat /var/spool/pbs/pbs_version
+ old_pbs_version=20.0.1
+ '[' -x /opt/pbs/bin/qstat ']'
++ /opt/pbs/bin/qstat --version
++ sed -e 's/^.* = //'
+ pbs_version=20.0.1
+ '[' -z 20.0.1 ']'
+ '[' 20.0.1 '!=' 20.0.1 ']'
++ get_server_hostname
++ shn=
++ '[' -z '' -o -z '' ']'
++ '[' -z '' ']'
++ shn=pbshead1.dev.example.local
++ echo pbshead1.dev.example.local
++ awk '{print tolower($0)}'
+ server_hostname=pbshead1.dev.example.local
+ '[' pbshead1.dev.example.local = change_this_to_pbs_server_hostname ']'
+ '[' -z pbshead1.dev.example.local ']'
+ check_hostname pbshead1.dev.example.local
+ getent hosts pbshead1.dev.example.local
+ return 0
+ '[' 0 -ne 0 ']'
++ echo pbshead1.dev.example.local
++ awk -F. '{print $1}'
+ server=pbshead1
+ '[' -f /var/spool/pbs/mom_priv/config ']'
+ cmd='cat "$PBS_HOME/mom_priv/config"'
+ cmd='cat "$PBS_HOME/mom_priv/config" | sed -e '\''s/\t/ /g'\'' -e '\''s/ \+/ /g'\'' -e '\''s/^ //'\'' -e '\''s/ $//'\'''
+ cmd='cat "$PBS_HOME/mom_priv/config" | sed -e '\''s/\t/ /g'\'' -e '\''s/ \+/ /g'\'' -e '\''s/^ //'\'' -e '\''s/ $//'\'' | grep '\''\$clienthost'\'' | cut -d'\'' '\'' -f2'
++ eval cat '"$PBS_HOME/mom_priv/config"' '|' sed -e ''\''s/\t/' '/g'\''' -e ''\''s/' '\+/' '/g'\''' -e ''\''s/^' '//'\''' -e ''\''s/' '$//'\''' '|' grep ''\''\$clienthost'\''' '|' cut '-d'\''' \' -f2
+++ cat /var/spool/pbs/mom_priv/config
+++ sed -e 's/\t/ /g' -e 's/ \+/ /g' -e 's/^ //' -e 's/ $//'
+++ grep '\$clienthost'
+++ cut '-d ' -f2
+ for host in `eval $cmd`
+ '[' -z pbshead1 -o pbshead1 = CHANGE_THIS_TO_PBS_SERVER_HOSTNAME ']'
+ check_hostname pbshead1
+ getent hosts pbshead1
+ return 0
+ '[' 1 '!=' 0 ']'
+ . /opt/pbs/libexec/pbs_pgsql_env.sh
++ PGSQL_LIBSTR=
++ '[' -d /opt/pbs/pgsql ']'
+++ type psql
+++ cut '-d ' -f3
++ PGSQL_CMD=/usr/bin/psql
++ '[' -z /usr/bin/psql ']'
+++ type pg_config
+++ cut '-d ' -f3
++ PGSQL_CONF=/usr/bin/pg_config
++ '[' -z /usr/bin/pg_config ']'
+++ /usr/bin/pg_config
+++ awk '/BINDIR/{ print $3 }'
++ PGSQL_BIN=/usr/lib/postgresql/10/bin
+++ dirname /usr/lib/postgresql/10/bin
++ PGSQL_DIR=/usr/lib/postgresql/10
++ '[' /usr/lib/postgresql/10 = / ']'
+ PBS_licensing_loc_file=PBS_licensing_loc
+ export PGSQL_LIBSTR
+ dbuser_fl=/var/spool/pbs/server_priv/db_user
++ get_db_user
++ '[' -f /var/spool/pbs/server_priv/db_user ']'
+++ cat /var/spool/pbs/server_priv/db_user
+++ tr -d '[:space:]'
++ dbuser_name=pbsdata
++ '[' -z pbsdata ']'
++ '[' '!' -f /var/spool/pbs/server_priv/db_user ']'
++ cat /var/spool/pbs/server_priv/db_user
++ return 0
+ PBS_DATA_SERVICE_USER=pbsdata
+ '[' 0 -ne 0 ']'
+ chk_dataservice_user pbsdata
+ chk_usr=pbsdata
++ id pbsdata
+ id='uid=997(pbsdata) gid=997(pbsdata) groups=997(pbsdata),114(postgres)'
+ '[' 0 -ne 0 ']'
++ echo 'uid=997(pbsdata)' 'gid=997(pbsdata)' 'groups=997(pbsdata),114(postgres)'
++ cut -c5-
++ cut -d '(' -f1
+ id=997
+ '[' 997 = 0 ']'
+ su - pbsdata -c cd
+ '[' 0 -ne 0 ']'
+ return 0
+ '[' 0 -ne 0 ']'
+ '[' '!' -f /var/spool/pbs/server_priv/db_user ']'
+ server_started=0
+ PBS_DATA_SERVICE_PORT=15007
+ export PBS_DATA_SERVICE_PORT
+ create_new_svr_data=1
+ '[' '!' -x /opt/pbs/libexec/install_db ']'
++ /opt/pbs/libexec/install_db
+ resp=
+ ret=2
+ '[' 2 -eq 2 ']'
+ create_new_svr_data=0
+ export PBS_HOME
+ export PBS_EXEC
+ export PBS_SERVER
+ export PBS_ENVIRONMENT
+ '[' 0 -eq 0 ']'
+ upgrade_pbs_database
+ user=pbsdata
+ inst_dir=/usr/lib/postgresql/10
+ data_dir=/var/spool/pbs/datastore
+ '[' -d /var/spool/pbs/pgsql.old ']'
+ '[' -d /var/spool/pbs/pgsql.forupgrade ']'
+ '[' '!' -f /var/spool/pbs/datastore/PG_VERSION ']'
++ awk '-F[ .]' '{ print $3"."$4 }'
+++ /usr/lib/postgresql/10/bin/postgres -V
++ echo postgres '(PostgreSQL)' 10.22 '(Ubuntu' '10.22-0ubuntu0.18.04.1)'
+ sys_pgsql_ver=10.22
++ cat /var/spool/pbs/datastore/PG_VERSION
+ old_pgsql_ver=10
+ [[ ! 10 =~ \. ]]
++ echo 10.22
++ cut -d . -f 1
+ sys_pgsql_ver=10
+ '[' 10 -eq 10 ']'
+ '[' 10 '>' 10 ']'
+ '[' 10 -gt 10 ']'
+ result=1
+ '[' 1 -eq 0 ']'
+ '[' 10 = 10 ']'
+ return 0
+ ret=0
+ '[' 0 -ne 0 ']'
+ '[' -d '' ']'
+ set_db_trust_login /var/spool/pbs/datastore
+ datastore_dir=/var/spool/pbs/datastore
++ cp -p /var/spool/pbs/datastore/pg_hba.conf /var/spool/pbs/datastore/pg_hba.conf.orig
+ err=
+ '[' 0 -ne 0 ']'
++ chown pbsdata /var/spool/pbs/datastore/pg_hba.conf.orig
+ err=
+ '[' 0 -ne 0 ']'
++ sed s/md5/trust/g /var/spool/pbs/datastore/pg_hba.conf
+ err=
+ '[' 0 -ne 0 ']'
++ chown pbsdata /var/spool/pbs/datastore/pg_hba.conf.new
+ err=
+ '[' 0 -ne 0 ']'
++ mv /var/spool/pbs/datastore/pg_hba.conf.new /var/spool/pbs/datastore/pg_hba.conf
+ err=
+ '[' 0 -ne 0 ']'
+ /opt/pbs/libexec/pbs_schema_upgrade 15007 pbsdata
+ ret=0
+ '[' 0 -ne 0 ']'
+ '[' 20.0.1 '<' 14.0 ']'
+ revoke_db_trust_login /var/spool/pbs/datastore
+ datastore_dir=/var/spool/pbs/datastore
++ cp -p /var/spool/pbs/datastore/pg_hba.conf.orig /var/spool/pbs/datastore/pg_hba.conf
+ err=
+ '[' 0 -eq 0 ']'
+ rm -f /var/spool/pbs/datastore/pg_hba.conf.orig
+ '[' 0 -eq 1 ']'
+ '[' -f /var/spool/pbs/server_priv/PBS_licensing_loc ']'
+ '[' 0 '!=' 0 ']'
+ '[' 0 -eq 1 ']'
+ '[' -d /var/spool/pbs/mom_priv/jobs ']'
+ upgrade_cmd=/opt/pbs/sbin/pbs_upgrade_job
+ '[' -x /opt/pbs/sbin/pbs_upgrade_job ']'
+ total=0
+ upgraded=0
+ for file in ${PBS_HOME}/mom_priv/jobs/*.JB
+ '[' -f '/var/spool/pbs/mom_priv/jobs/*.JB' ']'
+ '[' 0 -gt 0 ']'
+ echo 20.0.1
+ '[' '!' -d /var/spool/pbs/pgsql.forupgrade -a -d /opt/pbs/pgsql -a -d /var/spool/pbs ']'
+ echo '*** End of /opt/pbs/libexec/pbs_habitat'
*** End of /opt/pbs/libexec/pbs_habitat
+ exit 0

Since now I know what happens in that line (+ [[ ! 10 =~ \. ]]), I’ll just rollback my VM snapshot, remove that test and try again:

-        [[ ! $old_pgsql_ver =~ "." ]] && sys_pgsql_ver=$(echo $sys_pgsql_ver | cut -d '.' -f 1)
+        sys_pgsql_ver=$(echo $sys_pgsql_ver | cut -d '.' -f 1)

…and it will succeed:

Summary
root@pbshead1:~# sh -x /opt/pbs/libexec/pbs_habitat
+ [ 0 -eq 1 -a  = --version ]
+ PBS_VERSION=20.0.1
+ PBS_AES_SWITCH_VER=14.0
+ conf=/etc/pbs.conf
+ uname
+ ostype=Linux
+ umask 022
+ echo ***
***
+ . /etc/pbs.conf
+ PBS_SERVER=pbshead1.dev.example.local
+ PBS_START_SERVER=1
+ PBS_START_SCHED=1
+ PBS_START_COMM=1
+ PBS_START_MOM=0
+ PBS_EXEC=/opt/pbs
+ PBS_HOME=/var/spool/pbs
+ PBS_CORE_LIMIT=unlimited
+ PBS_SCP=/usr/bin/scp
+ [ -z /opt/pbs ]
+ [ ! -d /opt/pbs ]
+ [ -z /var/spool/pbs ]
+ /bin/ls -A /var/spool/pbs
+ [ ! -d /var/spool/pbs -o ! aux
checkpoint
comm_logs
datastore
mom_logs
mom_priv
pbs_environment
pbs_version
sched_logs
sched_priv
server_logs
server_priv
spool
undelivered ]
+ [ -f /var/spool/pbs/pbs_version ]
+ cat /var/spool/pbs/pbs_version
+ old_pbs_version=20.0.1
+ [ -x /opt/pbs/bin/qstat ]
+ /opt/pbs/bin/qstat --version
+ sed -e s/^.* = //
+ pbs_version=20.0.1
+ [ -z 20.0.1 ]
+ [ 20.0.1 != 20.0.1 ]
+ get_server_hostname
+ shn=
+ [ -z  -o -z  ]
+ [ -z  ]
+ shn=pbshead1.dev.example.local
+ echo pbshead1.dev.example.local
+ awk {print tolower($0)}
+ server_hostname=pbshead1.dev.example.local
+ [ pbshead1.dev.example.local = change_this_to_pbs_server_hostname ]
+ [ -z pbshead1.dev.example.local ]
+ check_hostname pbshead1.dev.example.local
+ getent hosts pbshead1.dev.example.local
+ return 0
+ [ 0 -ne 0 ]
+ echo pbshead1.dev.example.local
+ awk -F. {print $1}
+ server=pbshead1
+ [ -f /var/spool/pbs/mom_priv/config ]
+ cmd=cat "$PBS_HOME/mom_priv/config"
+ cmd=cat "$PBS_HOME/mom_priv/config" | sed -e 's/\t/ /g' -e 's/ \+/ /g' -e 's/^ //' -e 's/ $//'
+ cmd=cat "$PBS_HOME/mom_priv/config" | sed -e 's/\t/ /g' -e 's/ \+/ /g' -e 's/^ //' -e 's/ $//' | grep '\$clienthost' | cut -d' ' -f2
+ eval cat "$PBS_HOME/mom_priv/config" | sed -e 's/\t/ /g' -e 's/ \+/ /g' -e 's/^ //' -e 's/ $//' | grep '\$clienthost' | cut -d' ' -f2
+ cat /var/spool/pbs/mom_priv/config
+ sed -e s/\t/ /g -e s/ \+/ /g -e s/^ // -e s/ $//
+ grep \$clienthost
+ cut -d  -f2
+ [ -z pbshead1 -o pbshead1 = CHANGE_THIS_TO_PBS_SERVER_HOSTNAME ]
+ check_hostname pbshead1
+ getent hosts pbshead1
+ return 0
+ [ 1 != 0 ]
+ . /opt/pbs/libexec/pbs_pgsql_env.sh
+ PGSQL_LIBSTR=
+ [ -d /opt/pbs/pgsql ]
+ type psql
+ cut -d  -f3
+ PGSQL_CMD=/usr/bin/psql
+ [ -z /usr/bin/psql ]
+ type pg_config
+ cut -d  -f3
+ PGSQL_CONF=/usr/bin/pg_config
+ [ -z /usr/bin/pg_config ]
+ /usr/bin/pg_config
+ awk /BINDIR/{ print $3 }
+ PGSQL_BIN=/usr/lib/postgresql/10/bin
+ dirname /usr/lib/postgresql/10/bin
+ PGSQL_DIR=/usr/lib/postgresql/10
+ [ /usr/lib/postgresql/10 = / ]
+ PBS_licensing_loc_file=PBS_licensing_loc
+ export PGSQL_LIBSTR
+ dbuser_fl=/var/spool/pbs/server_priv/db_user
+ get_db_user
+ [ -f /var/spool/pbs/server_priv/db_user ]
+ cat /var/spool/pbs/server_priv/db_user
+ tr -d [:space:]
+ dbuser_name=pbsdata
+ [ -z pbsdata ]
+ [ ! -f /var/spool/pbs/server_priv/db_user ]
+ cat /var/spool/pbs/server_priv/db_user
+ return 0
+ PBS_DATA_SERVICE_USER=pbsdata
+ [ 0 -ne 0 ]
+ chk_dataservice_user pbsdata
+ chk_usr=pbsdata
+ id pbsdata
+ id=uid=997(pbsdata) gid=997(pbsdata) groups=997(pbsdata),114(postgres)
+ [ 0 -ne 0 ]
+ echo uid=997(pbsdata)+ cut -d ( -f1
 gid=997(pbsdata) groups=997(pbsdata),114(postgres)
+ cut -c5-
+ id=997
+ [ 997 = 0 ]
+ su - pbsdata -c cd
+ [ 0 -ne 0 ]
+ return 0
+ [ 0 -ne 0 ]
+ [ ! -f /var/spool/pbs/server_priv/db_user ]
+ server_started=0
+ PBS_DATA_SERVICE_PORT=15007
+ export PBS_DATA_SERVICE_PORT
+ create_new_svr_data=1
+ [ ! -x /opt/pbs/libexec/install_db ]
+ /opt/pbs/libexec/install_db
+ resp=
+ ret=2
+ [ 2 -eq 2 ]
+ create_new_svr_data=0
+ export PBS_HOME
+ export PBS_EXEC
+ export PBS_SERVER
+ export PBS_ENVIRONMENT
+ [ 0 -eq 0 ]
+ upgrade_pbs_database
+ user=pbsdata
+ inst_dir=/usr/lib/postgresql/10
+ data_dir=/var/spool/pbs/datastore
+ [ -d /var/spool/pbs/pgsql.old ]
+ [ -d /var/spool/pbs/pgsql.forupgrade ]
+ [ ! -f /var/spool/pbs/datastore/PG_VERSION ]
+ + awk -F[ .] { print $3"."$4 }
/usr/lib/postgresql/10/bin/postgres -V
+ echo postgres (PostgreSQL) 10.22 (Ubuntu 10.22-0ubuntu0.18.04.1)
+ sys_pgsql_ver=10.22
+ cat /var/spool/pbs/datastore/PG_VERSION
+ old_pgsql_ver=10
+ echo 10.22
+ cut -d . -f 1
+ sys_pgsql_ver=10
+ [ 10 -eq 10 ]
+ [ 10 > 10 ]
+ [ 10 -gt 10 ]
+ result=1
+ [ 1 -eq 0 ]
+ [ 10 = 10 ]
+ return 0
+ ret=0
+ [ 0 -ne 0 ]
+ [ -d  ]
+ set_db_trust_login /var/spool/pbs/datastore
+ datastore_dir=/var/spool/pbs/datastore
+ cp -p /var/spool/pbs/datastore/pg_hba.conf /var/spool/pbs/datastore/pg_hba.conf.orig
+ err=
+ [ 0 -ne 0 ]
+ chown pbsdata /var/spool/pbs/datastore/pg_hba.conf.orig
+ err=
+ [ 0 -ne 0 ]
+ sed s/md5/trust/g /var/spool/pbs/datastore/pg_hba.conf
+ err=
+ [ 0 -ne 0 ]
+ chown pbsdata /var/spool/pbs/datastore/pg_hba.conf.new
+ err=
+ [ 0 -ne 0 ]
+ mv /var/spool/pbs/datastore/pg_hba.conf.new /var/spool/pbs/datastore/pg_hba.conf
+ err=
+ [ 0 -ne 0 ]
+ /opt/pbs/libexec/pbs_schema_upgrade 15007 pbsdata
+ ret=0
+ [ 0 -ne 0 ]
+ [ 20.0.1 < 14.0 ]
+ revoke_db_trust_login /var/spool/pbs/datastore
+ datastore_dir=/var/spool/pbs/datastore
+ cp -p /var/spool/pbs/datastore/pg_hba.conf.orig /var/spool/pbs/datastore/pg_hba.conf
+ err=
+ [ 0 -eq 0 ]
+ rm -f /var/spool/pbs/datastore/pg_hba.conf.orig
+ [ 0 -eq 1 ]
+ [ -f /var/spool/pbs/server_priv/PBS_licensing_loc ]
+ [ 0 != 0 ]
+ [ 0 -eq 1 ]
+ [ -d /var/spool/pbs/mom_priv/jobs ]
+ upgrade_cmd=/opt/pbs/sbin/pbs_upgrade_job
+ [ -x /opt/pbs/sbin/pbs_upgrade_job ]
+ total=0
+ upgraded=0
+ [ -f /var/spool/pbs/mom_priv/jobs/*.JB ]
+ [ 0 -gt 0 ]
+ echo 20.0.1
+ [ ! -d /var/spool/pbs/pgsql.forupgrade -a -d /opt/pbs/pgsql -a -d /var/spool/pbs ]
+ echo *** End of /opt/pbs/libexec/pbs_habitat
*** End of /opt/pbs/libexec/pbs_habitat
+ exit 0

Further, I tried the following:

  • Follow upgrade path to the point where OpenPBS 22.05.11 is installed, but not yet started.
  • Changed shebang of /opt/pbs/libexec/pbs_db_utility to #!/bin/bash to get compatibility with [[ ]] and =~ syntax
  • added backup_pgsql as discussed before (but it does not exist in the code, I found out later)
  • ran sh -x /opt/pbs/libexec/pbs_habitat

Error is still

Data service directory from previous PBS installation not found,
Datastore upgrade cannot continue
Failed to upgrade PBS Datastore
Full output
root@pbshead1:~# grep -Hirn backup_pgsql /opt/pbs/
/opt/pbs/libexec/pbs_habitat:281:		backup_pgsql # HACK HACK HACK
root@pbshead1:~# /opt/pbs/libexec/pbs_init.d --version
pbs_version = 22.05.11
root@pbshead1:~# vi /opt/pbs/libexec/pbs_db_utility  # (changed the shebang here)
root@pbshead1:~# sh -x /opt/pbs/libexec/pbs_habitat
+ [ 0 -eq 1 -a  = --version ]
+ PBS_VERSION=22.05.11
+ INSTALL_DB=install_db
+ UPGRADE_DB=upgrade_db
+ conf=/etc/pbs.conf
+ uname
+ ostype=Linux
+ umask 022
+ echo ***
***
+ . /etc/pbs.conf
+ PBS_SERVER=pbshead1.dev.example.local
+ PBS_START_SERVER=1
+ PBS_START_SCHED=1
+ PBS_START_COMM=1
+ PBS_START_MOM=0
+ PBS_EXEC=/opt/pbs
+ PBS_HOME=/var/spool/pbs
+ PBS_CORE_LIMIT=unlimited
+ PBS_SCP=/usr/bin/scp
+ [ -z /opt/pbs ]
+ [ ! -d /opt/pbs ]
+ [ -z /var/spool/pbs ]
+ /bin/ls -A /var/spool/pbs
+ [ ! -d /var/spool/pbs -o ! aux
checkpoint
comm_logs
datastore
mom_logs
mom_priv
pbs_environment
pbs_version
sched_logs
sched_priv
server_logs
server_priv
spool
undelivered ]
+ [ -x /opt/pbs/bin/qstat ]
+ /opt/pbs/bin/qstat+  --version
sed -e s/^.* = //
+ pbs_version=22.05.11
+ [ -z 22.05.11 ]
+ [ 22.05.11 != 22.05.11 ]
+ get_server_hostname
+ shn=
+ [ -z  -o -z  ]
+ [ -z  ]
+ shn=pbshead1.dev.example.local
+ echo pbshead1.dev.example.local
+ awk {print tolower($0)}
+ server_hostname=pbshead1.dev.example.local
+ [ pbshead1.dev.example.local = change_this_to_pbs_server_hostname ]
+ [ -z pbshead1.dev.example.local ]
+ check_hostname pbshead1.dev.example.local
+ getent hosts pbshead1.dev.example.local
+ return 0
+ [ 0 -ne 0 ]
+ echo pbshead1.dev.example.local
+ awk -F. {print $1}
+ server=pbshead1
+ [ 1 != 0 ]
+ [ ! -x /opt/pbs/libexec/pbs_db_utility ]
+ . /opt/pbs/libexec/pbs_db_env
+ PGSQL_LIBSTR=
+ [ -z /opt/pbs ]
+ [ -d /opt/pbs/pgsql ]
+ type+ cut -d  -f3
 psql
+ PGSQL_CMD=/usr/bin/psql
+ [ -z /usr/bin/psql ]
+ type pg_config
+ cut -d  -f3
+ PGSQL_CONF=/usr/bin/pg_config
+ [ -z /usr/bin/pg_config ]
+ /usr/bin/pg_config
+ awk /BINDIR/{ print $3 }
+ PGSQL_BIN=/usr/lib/postgresql/10/bin
+ dirname /usr/lib/postgresql/10/bin
+ PGSQL_DIR=/usr/lib/postgresql/10
+ [ /usr/lib/postgresql/10 = / ]
+ export PGSQL_BIN=/usr/lib/postgresql/10/bin
+ [ -d /opt/pbs/lib ]
+ LD_LIBRARY_PATH=/opt/pbs/lib:
+ export LD_LIBRARY_PATH
+ [ 0 -ne 0 ]
+ PBS_licensing_loc_file=PBS_licensing_loc
+ dbuser_fl=/var/spool/pbs/server_priv/db_user
+ get_db_user
+ [ -f /var/spool/pbs/server_priv/db_user ]
+ cat /var/spool/pbs/server_priv/db_user
+ tr -d [:space:]
+ dbuser_name=pbsdata
+ [ -z pbsdata ]
+ [ ! -f /var/spool/pbs/server_priv/db_user ]
+ cat /var/spool/pbs/server_priv/db_user
+ return 0
+ PBS_DATA_SERVICE_USER=pbsdata
+ [ 0 -ne 0 ]
+ chk_dataservice_user pbsdata
+ chk_usr=pbsdata
+ id pbsdata
+ id=uid=997(pbsdata) gid=997(pbsdata) groups=997(pbsdata),114(postgres)
+ [ 0 -ne 0 ]
+ echo uid=997(pbsdata) gid=997(pbsdata) groups=997(pbsdata),114(postgres)
+ cut -c5-
+ cut -d ( -f1
+ id=997
+ [ 997 = 0 ]
+ su - pbsdata -s /bin/sh -c cd
+ [ 0 -ne 0 ]
+ return 0
+ [ 0 -ne 0 ]
+ export PBS_DATA_SERVICE_USER
+ server_started=0
+ PBS_DATA_SERVICE_PORT=15007
+ export PBS_DATA_SERVICE_PORT
+ create_new_svr_data=1
+ /opt/pbs/libexec/pbs_db_utility install_db
+ resp=
+ ret=2
+ [ 2 -eq 2 ]
+ create_new_svr_data=0
+ export PBS_HOME
+ export PBS_EXEC
+ export PBS_SERVER
+ export PBS_ENVIRONMENT
+ [ 0 -eq 0 ]
+ backup_pgsql
/opt/pbs/libexec/pbs_habitat: 281: /opt/pbs/libexec/pbs_habitat: backup_pgsql: not found
+ /opt/pbs/libexec/pbs_db_utility upgrade_db
Data service directory from previous PBS installation not found,
Datastore upgrade cannot continue
Failed to upgrade PBS Datastore
+ [ 0 -eq 1 ]
+ [ -f /var/spool/pbs/server_priv/PBS_licensing_loc ]
+ [ 0 != 0 ]
+ [ 0 -eq 1 ]
+ [ -d /var/spool/pbs/mom_priv/jobs ]
+ upgrade_cmd=/opt/pbs/sbin/pbs_upgrade_job
+ [ -x /opt/pbs/sbin/pbs_upgrade_job ]
+ total=0
+ upgraded=0
+ [ -f /var/spool/pbs/mom_priv/jobs/*.JB ]
+ [ 0 -gt 0 ]
+ echo 22.05.11
+ echo *** End of /opt/pbs/libexec/pbs_habitat
*** End of /opt/pbs/libexec/pbs_habitat
+ exit 0

One of your outputs showed pbs_schema_upgrade running without error. Have you checked to see what the current schema version is via the select * from pbs.info psql command?

Also, what are the contents of your /var/spool/pbs/datastore/PG_VERSION file?

That run was with the modified line that previously included bash syntax. I reproduced it with that, and checked select * from pbs.info:

pbs_datastore=# select * from pbs.info;
 pbs_schema_version
--------------------
 1.4.0
(1 row)

/var/spool/pbs/datastore/PG_VERSION prints:

10

I think pbs_habitat and pbs_db_utility are out of sync. That is, pbs_db_utility still thinks PBS comes with its own copy of pgsql. However, pbs_habitat now allows for using the system pgsql.

At this point, you should open an issue at the openpbs github site showing the “Data service directory from previous PBS installation not found” message when running pbs_habitat.

However, to hack around the problem for now, I think you can update pbs_habitat to the latest version in the repo (df4c3206). Then, add the backup_pgsql hack to it that I mentioned earlier. Also, add your /bin/bash change to pbs_db_utility.

I think this combination will finally let pbs_schema_upgrade run and update the database schema so the server can start.

If this still fails, you can run pbs_schema_upgrade directly.

# export PBS_DATA_SERVICE_PORT=15007
# export PBS_DATA_SERVICE_USER=pbsdata
# set -a
# source /etc/pbs.conf
# set +a
# sh -x /opt/pbs/libexec/pbs_schema_upgrade 

Thank you, @dtalcott, I opened an issue on GitHub: pbs_habitat and pbs_db_utility out of sync when upgrading 20.0.1 -> 22.05.11 · Issue #2564 · openpbs/openpbs · GitHub.

Yesss, upgrade worked with your described changes:

"journalctl -fu pbs", with date and hostname truncated for readability
systemd[1]: Starting Portable Batch System...
pbs_init.d[2140]: Starting PBS
pbs_init.d[2140]: PBS Home directory /var/spool/pbs needs updating.
pbs_init.d[2140]: Running /opt/pbs/libexec/pbs_habitat to update it.
pbs_init.d[2140]: ***
su[2228]: Successful su for pbsdata by root
su[2228]: + ??? root:pbsdata
su[2228]: pam_unix(su:session): session opened for user pbsdata by (uid=0)
su[2228]: pam_unix(su:session): session closed for user pbsdata
pbs_init.d[2140]: cp: cannot stat '/usr/pgsql-10.22/lib/*': No such file or directory
pbs_init.d[2140]: cp: cannot stat '/usr/pgsql-10.22/lib/*': No such file or directory
pbs_init.d[2140]: cp: cannot stat '/usr/pgsql-10.22/share/timezonesets/*': No such file or directory
pbs_init.d[2140]: cp: cannot stat '/usr/pgsql-10.22/share/timezonesets/*': No such file or directory
pbs_init.d[2140]: cp: cannot stat '/usr/lib/postgresql/10/bin/pg_resetxlog': No such file or directory
pbs_init.d[2140]: *** Backing up /var/spool/pbs/pgsql.forupgrade to /var/spool/pbs/pgsql.forupgrade.pre.
pbs_init.d[2140]: *** /var/spool/pbs/pgsql.forupgrade.pre. may need to be manually removed if you do not wish to downgrade PBS.
su[2363]: Successful su for pbsdata by root
su[2363]: + ??? root:pbsdata
su[2363]: pam_unix(su:session): session opened for user pbsdata by (uid=0)
su[2363]: pam_unix(su:session): session closed for user pbsdata
su[2392]: Successful su for pbsdata by root
su[2392]: + ??? root:pbsdata
su[2392]: pam_unix(su:session): session opened for user pbsdata by (uid=0)
su[2392]: pam_unix(su:session): session closed for user pbsdata
su[2449]: Successful su for pbsdata by root
su[2449]: + ??? root:pbsdata
su[2449]: pam_unix(su:session): session opened for user pbsdata by (uid=0)
su[2449]: pam_unix(su:session): session closed for user pbsdata
su[2478]: Successful su for pbsdata by root
su[2478]: + ??? root:pbsdata
su[2478]: pam_unix(su:session): session opened for user pbsdata by (uid=0)
su[2478]: pam_unix(su:session): session closed for user pbsdata
su[2509]: Successful su for pbsdata by root
su[2509]: + ??? root:pbsdata
su[2509]: pam_unix(su:session): session opened for user pbsdata by (uid=0)
su[2509]: pam_unix(su:session): session closed for user pbsdata
su[2537]: Successful su for pbsdata by root
su[2537]: + ??? root:pbsdata
su[2537]: pam_unix(su:session): session opened for user pbsdata by (uid=0)
su[2537]: pam_unix(su:session): session closed for user pbsdata
su[2596]: Successful su for pbsdata by root
su[2596]: + ??? root:pbsdata
su[2596]: pam_unix(su:session): session opened for user pbsdata by (uid=0)
su[2596]: pam_unix(su:session): session closed for user pbsdata
su[2625]: Successful su for pbsdata by root
su[2625]: + ??? root:pbsdata
su[2625]: pam_unix(su:session): session opened for user pbsdata by (uid=0)
su[2625]: pam_unix(su:session): session closed for user pbsdata
su[2661]: Successful su for pbsdata by root
su[2661]: + ??? root:pbsdata
su[2661]: pam_unix(su:session): session opened for user pbsdata by (uid=0)
su[2661]: pam_unix(su:session): session closed for user pbsdata
pbs_init.d[2140]: /opt/pbs/sbin/pbs_ds_systemd: 43: [: xrunning: unexpected operator
su[2740]: Successful su for pbsdata by root
su[2740]: + ??? root:pbsdata
su[2740]: pam_unix(su:session): session opened for user pbsdata by (uid=0)
su[2740]: pam_unix(su:session): session closed for user pbsdata
su[2756]: Successful su for pbsdata by root
su[2756]: + ??? root:pbsdata
su[2756]: pam_unix(su:session): session opened for user pbsdata by (uid=0)
su[2756]: pam_unix(su:session): session closed for user pbsdata
su[2801]: Successful su for pbsdata by root
su[2801]: + ??? root:pbsdata
su[2801]: pam_unix(su:session): session opened for user pbsdata by (uid=0)
su[2801]: pam_unix(su:session): session closed for user pbsdata
su[2817]: Successful su for pbsdata by root
su[2817]: + ??? root:pbsdata
su[2817]: pam_unix(su:session): session opened for user pbsdata by (uid=0)
su[2817]: pam_unix(su:session): session closed for user pbsdata
su[2862]: Successful su for pbsdata by root
su[2862]: + ??? root:pbsdata
su[2862]: pam_unix(su:session): session opened for user pbsdata by (uid=0)
su[2862]: pam_unix(su:session): session closed for user pbsdata
su[2878]: Successful su for pbsdata by root
su[2878]: + ??? root:pbsdata
su[2878]: pam_unix(su:session): session opened for user pbsdata by (uid=0)
su[2878]: pam_unix(su:session): session closed for user pbsdata
su[2896]: Successful su for pbsdata by root
su[2896]: + ??? root:pbsdata
su[2896]: pam_unix(su:session): session opened for user pbsdata by (uid=0)
su[2896]: pam_unix(su:session): session closed for user pbsdata
su[2912]: Successful su for pbsdata by root
su[2912]: + ??? root:pbsdata
su[2912]: pam_unix(su:session): session opened for user pbsdata by (uid=0)
su[2912]: pam_unix(su:session): session closed for user pbsdata
pbs_init.d[2140]: cp: cannot stat '/usr/pgsql-10.22/lib/*': No such file or directory
pbs_init.d[2140]: cp: cannot stat '/usr/pgsql-10.22/lib/*': No such file or directory
pbs_init.d[2140]: cp: cannot stat '/usr/pgsql-10.22/share/timezonesets/*': No such file or directory
pbs_init.d[2140]: cp: cannot stat '/usr/pgsql-10.22/share/timezonesets/*': No such file or directory
pbs_init.d[2140]: cp: cannot stat '/usr/lib/postgresql/10/bin/pg_resetxlog': No such file or directory
pbs_init.d[2140]: *** End of /opt/pbs/libexec/pbs_habitat
pbs_init.d[2140]: Home directory /var/spool/pbs updated.
pbs_init.d[2140]: PBS comm
pbs_init.d[2140]: /opt/pbs/sbin/pbs_comm ready (pid=2981), Proxy Name:pbshead1.dev.example.local:17001, Threads:4
pbs_init.d[2140]: PBS sched
su[3017]: Successful su for pbsdata by root
su[3017]: + ??? root:pbsdata
su[3017]: pam_unix(su:session): session opened for user pbsdata by (uid=0)
su[3017]: pam_unix(su:session): session closed for user pbsdata
su[3050]: Successful su for pbsdata by root
su[3050]: + ??? root:pbsdata
su[3050]: pam_unix(su:session): session opened for user pbsdata by (uid=0)
su[3050]: pam_unix(su:session): session closed for user pbsdata
pbs_init.d[2140]: /opt/pbs/sbin/pbs_ds_systemd: 43: [: xrunning: unexpected operator
su[3099]: Successful su for pbsdata by root
su[3099]: + ??? root:pbsdata
su[3099]: pam_unix(su:session): session opened for user pbsdata by (uid=0)
su[3099]: pam_unix(su:session): session closed for user pbsdata
pbs_init.d[2140]: Connecting to PBS dataservice...connected to PBS dataservice@pbshead1.dev.example.local
pbs_init.d[2140]: Connecting to PBS dataservice...connected to PBS dataservice@pbshead1.dev.example.local
pbs_init.d[2140]: PBS server
systemd[1]: Started Portable Batch System.

There’s still another error appearing: pbs_init.d[2140]: /opt/pbs/sbin/pbs_ds_systemd: 43: [: xrunning: unexpected operator, but this is an unrelated problem I think.

For anyone bumping into the same issue, we managed to upgrade our OpenPBS cluster successfully from 20.0.1 to 23.06.06:

  • Stop and disable PBS on ALL machines: systemctl disable pbs --now
  • Upgrade OS
  • Download and compile PBS, do not start pbs.service.
  • Add line backup_pgsql after line 313 in /opt/pbs/libexec/pbs_habitat
  • Replace line 140 in /opt/pbs/libexec/pbs_db_utility with sys_pgsql_ver=$(echo $sys_pgsql_ver | cut -d '.' -f 1)
  • Start and enable PBS: systemctl enable pbs --now

Good luck.

1 Like