PBS is requeuing hundreds of thousands of old jobs on start (takes over 30 min to start)

When I start/restart the PBS service on the server/scheduler, it attempts to requeue hundreds of thousands of jobs to the point that it takes over 30 minutes for the service to start (currently over 500,000 jobs 25 min in). It could be looping through the jobs (hitting each job more than once).

Here is an excerpt from the server_logs:

08/11/2022 07:54:44;0100;Server@cfe1;Job;143996.cfe1;enqueuing into normal, state 9 hop 1
08/11/2022 07:54:44;0086;Server@cfe1;Job;143996.cfe1;Requeueing job, substate: 92 Requeued in queue: normal
08/11/2022 07:54:44;0100;Server@cfe1;Job;143997.cfe1;enqueuing into normal, state 9 hop 1
08/11/2022 07:54:44;0086;Server@cfe1;Job;143997.cfe1;Requeueing job, substate: 92 Requeued in queue: normal
08/11/2022 07:54:44;0100;Server@cfe1;Job;143998.cfe1;enqueuing into normal, state 9 hop 1
08/11/2022 07:54:44;0086;Server@cfe1;Job;143998.cfe1;Requeueing job, substate: 92 Requeued in queue: normal

Any thoughts would be greatly appreciated. These are all old jobs and can be deleted if I knew where they existed (they are not in qstat).

Regards,
Kevin

  1. When the server is up and running , you can unset the history and set it again. The jobs might have been in the history ( qstat -xH)

  2. You can delete these jobs from the PBS datastore (postgress database)

Please state the version of PBS Pro you are using, the forum members might share their experiences if they had similar issues with that version or a bug was fixed etc

1 Like

Thank you for those suggestions.

I will try to unset and reset the history setting and try restart again (as soon as the server starts again…)

I am not familiar with interacting directly with the PBS datastore (although I am familiar with postgres). Any documentation that you could point me to would be helpful.

I am currently running OpenPBS 20.0.1 on Rocky Linux 8.6.

Disabling the job history seems to have worked. Thank you again for the help adarsh

1 Like

Nice one , Thank you @kguay

@kguay Please share how you disabled the job history for the benefit of the community. Thanks,

qmgr:  set server job_history_enable=false
or
qmgr: unset server job_history_enable