I have jobs that in some cases are finishing with qstat reporting an exit status of 0 even though a child process from a Singularity container that provided the job environment reported an exit code >=1. My users are asking me to make it so qstat recognizes when a child process fails.
I have some experimentation in mind I can try. Before I do I figured I’d ask this community for any experiences with this situation. Please share your thoughts.
Here are some links. The sync option is either depreciated or an option with other schedulers forked off PBS. I also did not see any mention of it in recent/current PBS documentation
Got back to this and tested the directive #PBS -W block=true. I still saw the job with a child process in a singularity container produce an exit code of 1 while qstat still reported a sucessfullly finished job.