My job script submits other jobs. How to wait till those jobs finish?

For example, my main job script launches a bunch of other jobs . I want to wait till all those jobs are done before my main job terminates. The reason is I want my main job script to email the user when everything is done. But I want this email to get sent only when all the other jobs it submitted are done…

I can probably brute-force it with a while loop that checks the job statuses of all the jobs every 30 seconds. But I think this can be solved with using job dependency?

Hey,
You can do this with dependencies, but not in the way you think. When a job has an after* dependency, it will stay on hold until its dependencies are released. To do this via dependencies, you’d have your first job submit the new jobs, and then submit one final email job with an afterok dependency on all the jobs that were just submitted. The initial job would then end. When the submitted jobs end ok, the hold is released from the email job.

Keep in mind that when the hold is released, the job won’t necessarily run immediately. If eligible_time is used, it will have accumulated 0 eligible_time in the hold state, so it won’t bubble up in priority. Of course you, if calendaring is in use, a 1m 1cpu email job should be scheduled quickly as a filler job.

Bhroam

1 Like

See section 6.2, “Using Job Dependencies”, on page UG-107 in the PBS Professional User’s Guide.

Very clever, that sounds like the right way to do it!

Here is an example I created that seemed to work ok:

string=''
for x in `seq 1 2`
do

job=$(qsub  g"$x".sh)
string+="$job:"
echo "Submitted job $job"

done

echo $string

qsub -W depend=afterok:$string post.sh

Another way to do this, provided the number of jobs is at most a few dozen, is to use the qsub -W block=true option. This option causes qsub to submit the job, but then hang until the job finishes.

Thus, the top level job script does something like:

 qsub ... -W block=true job1 &
 qsub ... -W block=true job2 &
 ...
 wait
 mail -s "Jobs are done" some_user < /dev/null

That is, it spawns all the jobs with -W block=true and into the background (with &). Each qsub waits for its job to complete and the script uses wait to pause until all the qsubs finish. Last, it sends the mail.

As written, the job submission script itself hangs until all the jobs are done. You could start it in the background if you don’t want to tie up your terminal.