How can I configure PBS so that afterok will wait for the stageout operation of the parent job to complete before staging in the child job?

mark.gesing · June 29, 2018, 1:32pm

Hi everyone,

I have a series of jobs that have to run in a specific order. Each job needs to be able to read the results generated by the previous job in order to start properly.

My cluster is configured so that each compute node has a SSD dedicated to hosting files for current jobs running on that node, so I’m using stagein and stageout to put files on, and get files from the compute node that PBS assigns to the job.

the problem I’m running into is that when I use:

#PBS -W depend=afterok:[parent job number]"

the child job will start its stagein operation before the parent job has completed its stageout operation, and when the child job tries to execute, it will be missing files and fail to run.

I have been looking through the various manuals to try and find some configuration option I can change to have jobs using afterok wait for the stageout of the parent job to complete, however, I haven’t found any such configuration option so far.

How can I configure PBS, or re-write my submission script so that the child jobs will have the files they need?

Thanks,

Mark.

P.S. here is a sample PBS script for one of my child jobs

#PBS -N apply_current
#PBS -j oe
#PBS -o apply_current.out
#PBS -W sandbox=PRIVATE
#PBS -l select=1:ncpus=24:mpiprocs=24
#PBS -l abaqus_tokens=19
#PBS -l abaqus_count=19
#PBS -l walltime=10:00:00
#PBS -W stagein=".@rice:/home/mgesing/Documents/sandbox/tribometer/step_2-apply_current/uamp.o,.@rice:/home/mgesing/Documents/sandbox/tribometer/step_2-apply_current/apply_current.inp,.@rice:/home/mgesing/Documents/sandbox/tribometer/step_2-apply_current/apply_current.com"
#PBS -W stageout="apply_current.abq@rice:/home/mgesing/Documents/sandbox/tribometer/step_2-apply_current,apply_current.dat@rice:/home/mgesing/Documents/sandbox/tribometer/step_2-apply_current,apply_current.mdl@rice:/home/mgesing/Documents/sandbox/tribometer/step_2-apply_current,apply_current.msg@rice:/home/mgesing/Documents/sandbox/tribometer/step_2-apply_current,apply_current.odb@rice:/home/mgesing/Documents/sandbox/tribometer/step_2-apply_current,apply_current.pac@rice:/home/mgesing/Documents/sandbox/tribometer/step_2-apply_current,apply_current.prt@rice:/home/mgesing/Documents/sandbox/tribometer/step_2-apply_current,apply_current.res@rice:/home/mgesing/Documents/sandbox/tribometer/step_2-apply_current,apply_current.sel@rice:/home/mgesing/Documents/sandbox/tribometer/step_2-apply_current,apply_current.size@rice:/home/mgesing/Documents/sandbox/tribometer/step_2-apply_current,apply_current.sta@rice:/home/mgesing/Documents/sandbox/tribometer/step_2-apply_current,apply_current.stt@rice:/home/mgesing/Documents/sandbox/tribometer/step_2-apply_current,apply_current.sim@rice:/home/mgesing/Documents/sandbox/tribometer/step_2-apply_current,apply_current.use@rice:/home/mgesing/Documents/sandbox/tribometer/step_2-apply_current"
#PBS -W depend=afterok:5671
date                                                                                                      
abaqus python apply_current.com
date

adarsh · August 10, 2018, 1:45pm

You can use runjob hook or execjob_begin hook to make sure all the data required to run this job exists (or copy it from other location) , if not reject the job which will put the job back in the queue. In this case your dependent job would not start to run when its data is not available

Topic		Replies	Views
PBS to stage all data to the node Users/Site Administrators	4	2907	September 12, 2019
Ignoring finished dependencies Developers	8	1802	April 20, 2021
Looking for Some Help with Job Array Dependencies in OpenPBS! Developers	3	33	June 19, 2025
How to disable stagein/stageout in job lifecycle? Developers	3	607	November 12, 2019
Start the job at a particular time Users/Site Administrators	3	728	April 16, 2018

How can I configure PBS so that afterok will wait for the stageout operation of the parent job to complete before staging in the child job?

Related topics