Can a PBS database get too large?

When I want to restart PBS, it seems to take forever. Can a database get too large? We don’t really ever look back more than 10 days. Here is a partial output of qstat:

9610588.bonobo GWviirsG1 satuser 0 R medium
9610589.bonobo GWviirsVisG2 satuser 0 R medium
9610590.bonobo GWviirsStG2 satuser 0 R medium
9610591.bonobo GWviirsStG2 satuser 0 Q medium
9610592.bonobo GWviirsCiG2 satuser 0 Q medium

The size of the database depends on the

  • job history duration
  • number of jobs in the queue/running
  • number of compute nodes
  • the job script size for each of the job submissions
  • node configuration/server configuration /scheduler configuration

It might be the case, that there are more job submission per day.
For example looking at your job id count: If you have 1 million jobs per day , it would be 10million jobs in the database (if in case 10 days is your clusters job_history_duration) .

Please check

  • job history duration
  • jobscript_max_size
  • please check the size of qstat -fx > qstat_fx.txt ; ls -lhr qstat_fx.txt

So the job history setting removes jobs from the database?

set server job_history_duration = 336:00:00

Not set:

[root@bonobo tmp]# qmgr -c “p s” |grep jobscript_max_size
[root@bonobo tmp]#

-rwxr-xr-x. 1 root root 1.8G May 18 13:41 qstat_fx.txt

The length of the duration will cause more jobs to be stored in the database.

Snippet from the PBS Professional 2020.1 Administrator’s Guide, AG-521
13.19 Managing Amount of Memory for Job Scripts
By default, starting with version 13.1, PBS limits the size of any single job script to 100MB. You can set a different limit using the jobscript_max_size server attribute. The format for this attribute is size, and the units default to bytes. You can specify the units. For example:
Qmgr: set server jobscript_max_size = 10mb
Job script size affects server memory footprint. If a job submitter wants to use a really big script, they can put it in shared storage and call it from a short script, or they can run a small job script that stages in the big script, then calls it

As you can see here 1.8GB of data is space consumed by the jobs and their job description/request/environment variables/etc.