Database (in)compatibility: 20.0.1 <-> master?

Dear PBS developers,

I’d like to know in how far the database between the official release 20.0.1 and the current master branch are compatible. The background: I did a test installation of 20.0.1 on a Linux OpenSuse 15.3 machine, updated to the current master branch, and PBS failed to restart.

Ok, a bit more in detail:

  • I downloaded the rpm package for 20.0.1 from the GitHub page.
  • I switched to root account via su.
  • I did an installation via “rpm -i”
  • I started PBS via “systemctl start pbs”. Worked perfectly.
  • I stopped PBS.

# rpm -i openpbs-server-20.0.1-0.x86_64.rpm

# systemctl start pbs

Feb 12 16:24:55 dc2pbs4v systemd[1]: Started Portable Batch System.
# systemctl stop pbs

  • I switched back to a standard user account.
  • Now, I checked out the latest master branch of the development using git (2021, Feb 12th)
  • I changed the version number to “20.1.2” by editing the configure.ac - nothing more.
  • I compiled the code and created rpms (./autogen.sh, ./configure --prefix=/opt/pbs --libexecdir=/opt/pbs/libexec, make dist, rpmbuild -bb openpbs.spec in prepared SPECS folder)
  • I switched to root account via su.
  • I initiated an update by installing the created rpm.
  • PBS fails to start.

# rpm -U openpbs-server-20.1.2-0.x86_64.rpm
# systemctl start pbs
Job for pbs.service failed because the control process exited with error code.
See "systemctl status pbs.service" and "journalctl -xe" for details.

# systemctl status pbs.service
Feb 12 16:27:35 dc2pbs4v systemd[1]: pbs.service: Failed with result 'exit-code'.

# less /var/spool/pbs/server_logs/20210212
02/12/2021 16:27:08;0006;Server@dc2pbs4v;Fil;Server@dc2pbs4v;Version 20.1.2, started, initialization type = 1
02/12/2021 16:27:08;0002;Server@dc2pbs4v;Svr;Server@dc2pbs4v;pbs_status_db exit code 1
02/12/2021 16:27:08;0002;Server@dc2pbs4v;Svr;Server@dc2pbs4v;Starting PBS dataservice
02/12/2021 16:27:10;0002;Server@dc2pbs4v;Svr;Server@dc2pbs4v;Prepare of statement insert_job failed: ERROR: column "ji_jid" of relation "job" does not exist
LINE 1: ...at,ji_quetime,ji_rteretry,ji_fromsock,ji_fromaddr,ji_jid,ji_...
^ 42703

Do you have any clue? Are the databases not compatible?

Regards,

Michael

As an additional note: It looks like the database was not updated to 1.5.0 during rpm -U:

# psql -A -t -p 15007 -d pbs_datastore -U postgres -c "select pbs_schema_version from pbs.info"
1.4.0

journalctl says:

Feb 12 21:12:42 dc2pbs4v pbs_init.d[5246]: Starting PBS
Feb 12 21:12:42 dc2pbs4v pbs_init.d[5246]: PBS Home directory /var/spool/pbs needs updating.
Feb 12 21:12:42 dc2pbs4v pbs_init.d[5246]: Running /opt/pbs/libexec/pbs_habitat to update it.
Feb 12 21:12:42 dc2pbs4v pbs_init.d[5246]: ***
Feb 12 21:12:43 dc2pbs4v pbs_init.d[5246]: Data service directory from previous PBS installation not found,
Feb 12 21:12:43 dc2pbs4v pbs_init.d[5246]: Datastore upgrade cannot continue
Feb 12 21:12:43 dc2pbs4v pbs_init.d[5246]: Failed to upgrade PBS Datastore

Checking the version IDs of postgres gives me:

# cd /var/spool/pbs/datastore
# cat PG_VERSION
12
# echo ${PGSQL_BIN}/postgres -V | awk 'NR==1 {print $NF}' | cut -d '.' -f 1,2
12.5

# ${PGSQL_BIN}/postgres -V
postgres (PostgreSQL) 12.5

Looks like PBS gets confused with the “.5” in the version number. Seems that the .5 is not stored in “PG_VERSION”.

I did the same process again, but before starting my self-compiled PBS, I modified PG_VERSION and changed the first line to “12.5”. Now, after “rpm -U” and “systemctl start pbs”, the database seem to get updated, but the server stilll fails to start. This time, the error message in the server log reads:

02/12/2021 21:37:41;0002;Server@dc2pbs4v;Svr;Server@dc2pbs4v;Prepare of statement insert_que failed: ERROR: column "qu_creattm" of relation "queue" does not exist
LINE 1: insert into pbs.queue(qu_name, qu_type, qu_creattm, qu_savet...
^ 42703

Weird…

In 20.0.1, the table pbs.queue is created via

CREATE TABLE pbs.queue (
qu_name TEXT NOT NULL,
qu_type INTEGER NOT NULL,
qu_ctime TIMESTAMP NOT NULL,
qu_mtime TIMESTAMP NOT NULL,
attributes hstore NOT NULL default '',
CONSTRAINT queue_pk PRIMARY KEY (qu_name)
);

In the current master branch, the table is created via

CREATE TABLE pbs.queue (
qu_name TEXT NOT NULL,
qu_type INTEGER NOT NULL,
qu_creattm TIMESTAMP NOT NULL,
qu_savetm TIMESTAMP NOT NULL,
attributes hstore NOT NULL default '',
CONSTRAINT queue_pk PRIMARY KEY (qu_name)
);

The file pbs_schema_upgrade which should upgrade the database from 1.4.0 to 1.5.0, routine “upgrade_pbs_schema_from_v1_4_0”, does not contain any command to upgrade “pbs.queue”. A bug?

Just adding
ALTER TABLE pbs.queue RENAME COLUMN qu_ctime to qu_creattm;
ALTER TABLE pbs.queue RENAME COLUMN qu_mtime to qu_savetm;
to the pbs_schema_upgrade does not help, ends up in

pbs_init.d[24246]: *** Error in /opt/pbs/sbin/pbs_server.bin’: double free or corruption (fasttop): 0x0000000002644780 ***`

Deleting /var/spool/pbs and creating a fresh database, the self-compiled PBS runs fine.

I give up here. Probably someone of you has a clue…

Been there, done that.

See the following bug reports.