Jobs are killed because of "post job file processing error"

When I submit a dummy array of 1000 jobs to our PBS cluster (version 20.0.1) roughly ~35% of the jobs will end in a “post job file processing error”. These dummy jobs don’t perform any tasks besides a single “echo” statement. Even so, 347/1000 of the jobs end with the following error:

From adm@address Fri Feb 3 09:27:51 2023
Return-Path: adm@address
X-Original-To: user@cm.cluster
Delivered-To: user@address
Received: by address (Postfix, from userid 0)
id 4887F160000CD; Fri, 3 Feb 2023 09:27:51 -0500 (EST)
To: user@address

Subject: PBS JOB 6809[973].cluster

Message-Id: 20230203142751.4887F160000CD@address
Date: Fri, 3 Feb 2023 09:27:51 -0500 (EST)
From: root adm@address
PBS Job Id: 6809[973].cluster
Job Name:

Post job file processing error; job 6809[973].cluster on host n01

What is causing this error and how do I stop it from happening in the future?

The test script I am using is as follows:

#PBS -S /bin/bash
#PBS -o /output
#PBS -e /output
#PBS -J 1-1000


Are you really sending the job output to “/output”? The user has write access to the root directory?

Also, all of your jobs’ stdouts and stderrs go to the same file. This can cause trouble when multiple jobs try to write at the same time. Try removing your PBS -o and -e arguments so each subjob uses distinct files by default. See if that makes the problem go away.

(If you really want the stdout and stderr for a given job to go to the same file, take a look at the -j qsub option. E.g., -o foo.out -j oe.)

You might get more information about the exact failure by consulting the mom logs on the execution hosts.

All of the output is going to unique files, I just replaced the file path with “/output” in the question.