I’d like to know in how far the database between the official release 20.0.1 and the current master branch are compatible. The background: I did a test installation of 20.0.1 on a Linux OpenSuse 15.3 machine, updated to the current master branch, and PBS failed to restart.
Ok, a bit more in detail:
I downloaded the rpm package for 20.0.1 from the GitHub page.
I switched to root account via su.
I did an installation via “rpm -i”
I started PBS via “systemctl start pbs”. Worked perfectly.
I stopped PBS.
# rpm -i openpbs-server-20.0.1-0.x86_64.rpm
# systemctl start pbs
… Feb 12 16:24:55 dc2pbs4v systemd[1]: Started Portable Batch System. # systemctl stop pbs
I switched back to a standard user account.
Now, I checked out the latest master branch of the development using git (2021, Feb 12th)
I changed the version number to “20.1.2” by editing the configure.ac - nothing more.
I compiled the code and created rpms (./autogen.sh, ./configure --prefix=/opt/pbs --libexecdir=/opt/pbs/libexec, make dist, rpmbuild -bb openpbs.spec in prepared SPECS folder)
I switched to root account via su.
I initiated an update by installing the created rpm.
PBS fails to start.
# rpm -U openpbs-server-20.1.2-0.x86_64.rpm # systemctl start pbs Job for pbs.service failed because the control process exited with error code. See "systemctl status pbs.service" and "journalctl -xe" for details.
# systemctl status pbs.service Feb 12 16:27:35 dc2pbs4v systemd[1]: pbs.service: Failed with result 'exit-code'.
# less /var/spool/pbs/server_logs/20210212 02/12/2021 16:27:08;0006;Server@dc2pbs4v;Fil;Server@dc2pbs4v;Version 20.1.2, started, initialization type = 1 02/12/2021 16:27:08;0002;Server@dc2pbs4v;Svr;Server@dc2pbs4v;pbs_status_db exit code 1 02/12/2021 16:27:08;0002;Server@dc2pbs4v;Svr;Server@dc2pbs4v;Starting PBS dataservice 02/12/2021 16:27:10;0002;Server@dc2pbs4v;Svr;Server@dc2pbs4v;Prepare of statement insert_jobfailed: ERROR: column "ji_jid" of relation "job" does not exist LINE 1: ...at,ji_quetime,ji_rteretry,ji_fromsock,ji_fromaddr,ji_jid,ji_... ^ 42703
Do you have any clue? Are the databases not compatible?
Feb 12 21:12:42 dc2pbs4v pbs_init.d[5246]: Starting PBS Feb 12 21:12:42 dc2pbs4v pbs_init.d[5246]: PBS Home directory /var/spool/pbs needs updating. Feb 12 21:12:42 dc2pbs4v pbs_init.d[5246]: Running /opt/pbs/libexec/pbs_habitat to update it. Feb 12 21:12:42 dc2pbs4v pbs_init.d[5246]: *** Feb 12 21:12:43 dc2pbs4v pbs_init.d[5246]: Data service directory from previous PBS installation not found, Feb 12 21:12:43 dc2pbs4v pbs_init.d[5246]: Datastore upgrade cannot continue Feb 12 21:12:43 dc2pbs4v pbs_init.d[5246]: Failed to upgrade PBS Datastore
Looks like PBS gets confused with the “.5” in the version number. Seems that the .5 is not stored in “PG_VERSION”.
I did the same process again, but before starting my self-compiled PBS, I modified PG_VERSION and changed the first line to “12.5”. Now, after “rpm -U” and “systemctl start pbs”, the database seem to get updated, but the server stilll fails to start. This time, the error message in the server log reads:
02/12/2021 21:37:41;0002;Server@dc2pbs4v;Svr;Server@dc2pbs4v;Prepare of statement insert_que failed: ERROR: column "qu_creattm" of relation "queue" does not exist LINE 1: insert into pbs.queue(qu_name, qu_type, qu_creattm, qu_savet... ^ 42703
CREATE TABLE pbs.queue ( qu_name TEXT NOT NULL, qu_type INTEGER NOT NULL, qu_ctime TIMESTAMP NOT NULL, qu_mtime TIMESTAMP NOT NULL, attributes hstore NOT NULL default '', CONSTRAINT queue_pk PRIMARY KEY (qu_name) );
In the current master branch, the table is created via
CREATE TABLE pbs.queue ( qu_name TEXT NOT NULL, qu_type INTEGER NOT NULL, qu_creattm TIMESTAMP NOT NULL, qu_savetm TIMESTAMP NOT NULL, attributes hstore NOT NULL default '', CONSTRAINT queue_pk PRIMARY KEY (qu_name) );
The file pbs_schema_upgrade which should upgrade the database from 1.4.0 to 1.5.0, routine “upgrade_pbs_schema_from_v1_4_0”, does not contain any command to upgrade “pbs.queue”. A bug?
Just adding ALTER TABLE pbs.queue RENAME COLUMN qu_ctime to qu_creattm; ALTER TABLE pbs.queue RENAME COLUMN qu_mtime to qu_savetm;
to the pbs_schema_upgrade does not help, ends up in
pbs_init.d[24246]: *** Error in /opt/pbs/sbin/pbs_server.bin’: double free or corruption (fasttop): 0x0000000002644780 ***`
Deleting /var/spool/pbs and creating a fresh database, the self-compiled PBS runs fine.
I give up here. Probably someone of you has a clue…