When I submit a dummy array of 1000 jobs to our PBS cluster (version 20.0.1) roughly ~35% of the jobs will end in a “post job file processing error”. These dummy jobs don’t perform any tasks besides a single “echo” statement. Even so, 347/1000 of the jobs end with the following error:
From adm@address Fri Feb 3 09:27:51 2023
Return-Path: adm@address
X-Original-To: user@cm.cluster
Delivered-To: user@address
Received: by address (Postfix, from userid 0)
id 4887F160000CD; Fri, 3 Feb 2023 09:27:51 -0500 (EST)
To: user@address
Subject: PBS JOB 6809[973].cluster
Message-Id: 20230203142751.4887F160000CD@address
Date: Fri, 3 Feb 2023 09:27:51 -0500 (EST)
From: root adm@address
PBS Job Id: 6809[973].cluster
Job Name: test.sh
Post job file processing error; job 6809[973].cluster on host n01
What is causing this error and how do I stop it from happening in the future?
The test script I am using is as follows:
#!/bin/bash
#PBS -S /bin/bash
#PBS -o /output
#PBS -e /output
#PBS -J 1-1000
echo $PBS_JOBID