When I want to restart PBS, it seems to take forever. Can a database get too large? We don’t really ever look back more than 10 days. Here is a partial output of qstat:
9610588.bonobo GWviirsG1 satuser 0 R medium
9610589.bonobo GWviirsVisG2 satuser 0 R medium
9610590.bonobo GWviirsStG2 satuser 0 R medium
9610591.bonobo GWviirsStG2 satuser 0 Q medium
9610592.bonobo GWviirsCiG2 satuser 0 Q medium
The size of the database depends on the
- job history duration
- number of jobs in the queue/running
- number of compute nodes
- the job script size for each of the job submissions
- node configuration/server configuration /scheduler configuration
It might be the case, that there are more job submission per day.
For example looking at your job id count: If you have 1 million jobs per day , it would be 10million jobs in the database (if in case 10 days is your clusters job_history_duration) .
Please check
- job history duration
- jobscript_max_size
- please check the size of qstat -fx > qstat_fx.txt ; ls -lhr qstat_fx.txt
So the job history setting removes jobs from the database?
set server job_history_duration = 336:00:00
Not set:
[root@bonobo tmp]# qmgr -c “p s” |grep jobscript_max_size
[root@bonobo tmp]#
-rwxr-xr-x. 1 root root 1.8G May 18 13:41 qstat_fx.txt
The length of the duration will cause more jobs to be stored in the database.
Snippet from the PBS Professional 2020.1 Administrator’s Guide, AG-521
13.19 Managing Amount of Memory for Job Scripts
By default, starting with version 13.1, PBS limits the size of any single job script to 100MB. You can set a different limit using the jobscript_max_size server attribute. The format for this attribute is size, and the units default to bytes. You can specify the units. For example:
Qmgr: set server jobscript_max_size = 10mb
Job script size affects server memory footprint. If a job submitter wants to use a really big script, they can put it in shared storage and call it from a short script, or they can run a small job script that stages in the big script, then calls it
As you can see here 1.8GB of data is space consumed by the jobs and their job description/request/environment variables/etc.