PBS is requeuing hundreds of thousands of old jobs on start (takes over 30 min to start)

kguay · August 11, 2022, 11:59am

When I start/restart the PBS service on the server/scheduler, it attempts to requeue hundreds of thousands of jobs to the point that it takes over 30 minutes for the service to start (currently over 500,000 jobs 25 min in). It could be looping through the jobs (hitting each job more than once).

Here is an excerpt from the server_logs:

08/11/2022 07:54:44;0100;Server@cfe1;Job;143996.cfe1;enqueuing into normal, state 9 hop 1
08/11/2022 07:54:44;0086;Server@cfe1;Job;143996.cfe1;Requeueing job, substate: 92 Requeued in queue: normal
08/11/2022 07:54:44;0100;Server@cfe1;Job;143997.cfe1;enqueuing into normal, state 9 hop 1
08/11/2022 07:54:44;0086;Server@cfe1;Job;143997.cfe1;Requeueing job, substate: 92 Requeued in queue: normal
08/11/2022 07:54:44;0100;Server@cfe1;Job;143998.cfe1;enqueuing into normal, state 9 hop 1
08/11/2022 07:54:44;0086;Server@cfe1;Job;143998.cfe1;Requeueing job, substate: 92 Requeued in queue: normal

Any thoughts would be greatly appreciated. These are all old jobs and can be deleted if I knew where they existed (they are not in qstat).

Regards,
Kevin

adarsh · August 11, 2022, 1:17pm

When the server is up and running , you can unset the history and set it again. The jobs might have been in the history ( qstat -xH)
You can delete these jobs from the PBS datastore (postgress database)

Please state the version of PBS Pro you are using, the forum members might share their experiences if they had similar issues with that version or a bug was fixed etc

kguay · August 11, 2022, 1:21pm

Thank you for those suggestions.

I will try to unset and reset the history setting and try restart again (as soon as the server starts again…)

I am not familiar with interacting directly with the PBS datastore (although I am familiar with postgres). Any documentation that you could point me to would be helpful.

I am currently running OpenPBS 20.0.1 on Rocky Linux 8.6.

kguay · August 11, 2022, 1:56pm

Disabling the job history seems to have worked. Thank you again for the help adarsh

adarsh · August 11, 2022, 7:48pm

Nice one , Thank you @kguay

vamshi · August 17, 2022, 9:25am

@kguay Please share how you disabled the job history for the benefit of the community. Thanks,

adarsh · August 17, 2022, 1:40pm

qmgr:  set server job_history_enable=false
or
qmgr: unset server job_history_enable

Topic		Replies	Views
Reset jobID to zero Users/Site Administrators	1	1461	October 25, 2018
Job dropped instead of queue Users/Site Administrators	1	501	January 10, 2020
PBS Pro Service Will Not Start Users/Site Administrators	1	636	August 3, 2019
Deleting 150k+ queued jobs Users/Site Administrators	9	1448	September 16, 2020
Issue with submitting a job in Open PBS Users/Site Administrators	9	638	January 16, 2022

PBS is requeuing hundreds of thousands of old jobs on start (takes over 30 min to start)

Related topics