Redhat 9 and PBS server reboot causing "next job id" to increase

Ive been testing redhat 9 (loads of rhel8 systems that are not showing this issue) and it seems that every time the PBS server is rebooted the next job that gets submitted gets a job number rounded up to next 1000 starting point.
0,1,2,…502, (reboot) 1000,1001…1300 (reboot) 2000,2001…

the most bizarre database corruption I ever saw if this is accidental…

has anyone else seen oddness on redhat 9? (9.5)

not that its really a problem nor can I just not go in and fix up the sv_jobidnumber whenever the PBS server is booting now that I know there is strangeness.
its just odd.

thanks
s

If there is a abrupt shutdown of the pbs server/datastore , the job id count is incremented by X to avoid job corruption. You might have already checked all these, please check whether there is core dump or space issue or anything related to quota or /var/log/messages or postgres tunables might help. I have not encourntered this issue with your workflow, but only with the wrong failover configuration,

I looked at the code (get_next_svr_sequence_id in src/server/req_quejob.c). My guess is that the rounding is an unintended effect of database refactoring done by commit ce0cb14d0 to support a site-settable max jobid.

1 Like

so I need to set a max jobid (which I dont do Im thinking) or is this due to the previously suggested bad shutdown and when it comes back it just happens to round up a lot instead of to the next unused jobid?

thanks
s

couldnt find any core dumps from previous shutdowns nor any space issues but I was going to try to more gracefully shut down PBS before the server is rebooted. just havent yet.

thanks
s

if I shut down PBS as a service before rebooting the rounding up the next jobid looks fine again. so I will make sure I do that going forward.
maybe rhel9 isnt thinking about shutting down PBS automagically when going down like rhel8 does.

also Im assuming the
set server max_job_sequence_id = 9999999
is the default max job id as I dont see setting that in any of our build process.

thanks
s

1 Like

Thank you for the above information.

Max possible sequence ID is 12 digits: 999, 999,999,999; cluster administrators can limit the ID by setting the server level attribute 'max_job_sequence_id’.

Please note it is reset back to 0 once the max_job_sequence_id is reached.

It’s solved by adding a script which will auto-run before shutdown.

Add a executable file /usr/lib/systemd/system-shutdown/mystop.shutdown

#!/usr/bin/sh
# We need to ensure stop service or other jobs to finish
# before the shutdown.


/usr/bin/systemctl stop pbs.service

/usr/bin/sleep 10

The /usr/lib/systemd/system-shutdown/mystop.shutdown way not work.

The Root Cause of this issue is that:

The pbs_init.d startup script launches pbs_data_service and PostgreSQL as child processes that escape systemd’s control group (cgroup). During shutdown, systemd kills these processes in parallel with the main pbs.service, preventing kill database before stop pbs.service, and leadto unsafe stop of pbs.service. One could see the ‘service shutdown err‘ log in `/var/spool/pbs/server_logs/date_numbers`, such as

01/04/2026 09:15:56;0001;Server@mu;Svr;Server@mu;PBS server internal error (15011) in svr_save_db, Failed to save server Execution of Prepared statement update_svr failed: FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request. 57P01
01/04/2026 09:15:56;0001;Server@mu;Svr;Server@mu;panic_stop_db, Panic shutdown of Server on database error.  Please check PBS_HOME file system for no space condition.
01/04/2026 09:15:56;0002;Server@mu;Svr;Log;Log closed

One can create a safe-reboot file, and using it to stop pbs.service before reboot

[root@mu ~]# cat /usr/local/bin/safe-reboot
#!/bin/bash



echo "stoping pbs..."
/opt/pbs/libexec/pbs_init.d stop

sleep 5


echo "running... reboot $@"

exec /sbin/reboot "$@"



We can also change the poweroff command as the previous reboot.

However, this way only works in the comand shutdown or reboot by root, not other kinds of shutdown.