OpenPBS database corrupted

Hi All,

After I reboot the Head node . My new Job submission started from jobid 0,1,2..however it has few older jobs with ID 5017. You can see the command output below. Instead of starting from 5018 new Jobs started from 0.

Could anyone please guide me what need to be checked and what should be troubleshooting steps to be followed ?

$ qstat
Job id Name User Time Use S Queue


5017.ip-10-77-16* Linux_DCV_Sessi user1 03:36:46 R dcv
0.ip-10-77-162-79 Custom_DCV_Sess user2 00:06:16 R dcv
5.ip-10-77-162-79 Custom_DCV_Sess user3 00:04:08 R dcv
6.ip-10-77-162-79 Custom_DCV_Sess user4 00:01:09 R dcv
7.ip-10-77-162-79 Custom_DCV_Sess user5 00:00:17 R dcv

Is there a way to update what PBS will use as the jobid for the next job that gets submitted?

Was the headnode re-provisioned again with a fresh configuration, this might have triggered to start from 0.

You can update the job id counter to start from 5018 , by updating the counter in the database.
connect to the postgres database and update
1.make sure your pbs service service is stopped
2. connect to the postgres db using the service user
3. update as below.
pbs_datastore=# UPDATE server SET sv_jobidnumber = 5018;
pbs_datastore=#\q

Thanks Adarsh for the quick response . But it is actually not working .I followed the steps below.

$ /usr/pgsql-14/bin/psql -U postgres -d pbs_datastore -p 15007
Password for user postgres:
psql (14.16)
Type “help” for help.

pbs_datastore=# \dn
List of schemas
Name | Owner
--------±---------
pbs | postgres
public | postgres
(2 rows)

pbs_datastore=# UPDATE pbs.server SET sv_jobidnumber = 5018;
UPDATE 1
pbs_datastore=# \q

Observation: When I submitted the job it started from sequence no 17 instead of 5018.

$ ./Ansys-script-64c.sh
17.ip-10-77-162-79

Please find my protocol below:

[root@pbsserver ~]# qstat --version
pbs_version = 23.06.06

[root@pbsserver ~]# pbs_ds_password  (Say you have the password set: openpbs )
Enter the password:
Re-enter the password:

---> Updated user password
---> Success

[root@pbsserver ]# ps -ef | grep pbs_
root      126459       1  0 08:43 ?        00:00:00 /opt/pbs/sbin/pbs_comm
root      126474       1  0 08:43 ?        00:00:00 /opt/pbs/sbin/pbs_sched
root      126583       1  0 08:43 ?        00:00:00 /opt/pbs/sbin/pbs_ds_monitor monitor
postgres  126758  126657  0 08:43 ?        00:00:00 postgres: postgres pbs_datastore 192.168.64.168(46554) idle
root      126761       1  0 08:43 ?        00:00:00 /opt/pbs/sbin/pbs_server.bin
root      126786    3228  0 08:43 pts/0    00:00:00 grep --color=auto pbs_

[root@pbsserver ]# systemctl stop  pbs


[root@pbsserver ~]# source /etc/pbs.conf

[root@pbsserver ~]# /opt/pbs/sbin/pbs_dataservice start

[root@pbsserver ~]# psql -U postgres -p 15007 -d pbs_datastore
Password for user postgres: <enter the password openpbs>

pbs_datastore=#  UPDATE pbs.server SET sv_jobidnumber = 5018;
UPDATE 1
pbs_datastore=# \q

[root@pbsserver ~]# systemctl start pbs
[root@pbsserver ~]# su - pbsdata

[pbsdata@pbsserver ~]$ qsub -- /bin/sleep 10
5018.pbsserver

Thank you very much Adarsh for the detailed step by step guidance. This issue is resolved now

1 Like