Redhat 9 and PBS server reboot causing "next job id" to increase

Ive been testing redhat 9 (loads of rhel8 systems that are not showing this issue) and it seems that every time the PBS server is rebooted the next job that gets submitted gets a job number rounded up to next 1000 starting point.
0,1,2,…502, (reboot) 1000,1001…1300 (reboot) 2000,2001…

the most bizarre database corruption I ever saw if this is accidental…

has anyone else seen oddness on redhat 9? (9.5)

not that its really a problem nor can I just not go in and fix up the sv_jobidnumber whenever the PBS server is booting now that I know there is strangeness.
its just odd.

thanks
s

If there is a abrupt shutdown of the pbs server/datastore , the job id count is incremented by X to avoid job corruption. You might have already checked all these, please check whether there is core dump or space issue or anything related to quota or /var/log/messages or postgres tunables might help. I have not encourntered this issue with your workflow, but only with the wrong failover configuration,

I looked at the code (get_next_svr_sequence_id in src/server/req_quejob.c). My guess is that the rounding is an unintended effect of database refactoring done by commit ce0cb14d0 to support a site-settable max jobid.

1 Like

so I need to set a max jobid (which I dont do Im thinking) or is this due to the previously suggested bad shutdown and when it comes back it just happens to round up a lot instead of to the next unused jobid?

thanks
s

couldnt find any core dumps from previous shutdowns nor any space issues but I was going to try to more gracefully shut down PBS before the server is rebooted. just havent yet.

thanks
s

if I shut down PBS as a service before rebooting the rounding up the next jobid looks fine again. so I will make sure I do that going forward.
maybe rhel9 isnt thinking about shutting down PBS automagically when going down like rhel8 does.

also Im assuming the
set server max_job_sequence_id = 9999999
is the default max job id as I dont see setting that in any of our build process.

thanks
s

1 Like

Thank you for the above information.

Max possible sequence ID is 12 digits: 999, 999,999,999; cluster administrators can limit the ID by setting the server level attribute 'max_job_sequence_id’.

Please note it is reset back to 0 once the max_job_sequence_id is reached.