Hi,
I’m submitting large (10^3, 10^4) job arrays. At some point during the scheduling process it stalls, with the following messages in the scheduler log:
05/23/2017 23:29:08;0040;pbs_sched;Job;28[8453].ip-172-16-255-9;Job run
05/23/2017 23:29:08;0080;pbs_sched;Job;28[].ip-172-16-255-9;Considering job to run
05/23/2017 23:29:08;0040;pbs_sched;Job;28[8454].ip-172-16-255-9;Job run
05/23/2017 23:29:08;0080;pbs_sched;Job;28[].ip-172-16-255-9;Considering job to run
05/23/2017 23:29:08;0040;pbs_sched;Job;28[8455].ip-172-16-255-9;Job run
05/23/2017 23:29:08;0080;pbs_sched;Job;28[].ip-172-16-255-9;Considering job to run
05/23/2017 23:29:08;0040;pbs_sched;Job;28[].ip-172-16-255-9;Failed to run: Request invalid for state of job (15016)
05/23/2017 23:29:08;0080;pbs_sched;Job;28[].ip-172-16-255-9;Considering job to run
05/23/2017 23:29:08;0040;pbs_sched;Job;28[].ip-172-16-255-9;Failed to run: Request invalid for state of job (15016)
05/23/2017 23:29:08;0080;pbs_sched;Job;28[].ip-172-16-255-9;Considering job to run
The “Failed to run: Request invalid for state of job (15016)” messages goes on …
Has anyone seen this before?