Hi everyone,
I have a series of jobs that have to run in a specific order. Each job needs to be able to read the results generated by the previous job in order to start properly.
My cluster is configured so that each compute node has a SSD dedicated to hosting files for current jobs running on that node, so I’m using stagein and stageout to put files on, and get files from the compute node that PBS assigns to the job.
the problem I’m running into is that when I use:
#PBS -W depend=afterok:[parent job number]"
the child job will start its stagein operation before the parent job has completed its stageout operation, and when the child job tries to execute, it will be missing files and fail to run.
I have been looking through the various manuals to try and find some configuration option I can change to have jobs using afterok wait for the stageout of the parent job to complete, however, I haven’t found any such configuration option so far.
How can I configure PBS, or re-write my submission script so that the child jobs will have the files they need?
Thanks,
Mark.
P.S. here is a sample PBS script for one of my child jobs
#PBS -N apply_current
#PBS -j oe
#PBS -o apply_current.out
#PBS -W sandbox=PRIVATE
#PBS -l select=1:ncpus=24:mpiprocs=24
#PBS -l abaqus_tokens=19
#PBS -l abaqus_count=19
#PBS -l walltime=10:00:00
#PBS -W stagein=".@rice:/home/mgesing/Documents/sandbox/tribometer/step_2-apply_current/uamp.o,.@rice:/home/mgesing/Documents/sandbox/tribometer/step_2-apply_current/apply_current.inp,.@rice:/home/mgesing/Documents/sandbox/tribometer/step_2-apply_current/apply_current.com"
#PBS -W stageout="apply_current.abq@rice:/home/mgesing/Documents/sandbox/tribometer/step_2-apply_current,apply_current.dat@rice:/home/mgesing/Documents/sandbox/tribometer/step_2-apply_current,apply_current.mdl@rice:/home/mgesing/Documents/sandbox/tribometer/step_2-apply_current,apply_current.msg@rice:/home/mgesing/Documents/sandbox/tribometer/step_2-apply_current,apply_current.odb@rice:/home/mgesing/Documents/sandbox/tribometer/step_2-apply_current,apply_current.pac@rice:/home/mgesing/Documents/sandbox/tribometer/step_2-apply_current,apply_current.prt@rice:/home/mgesing/Documents/sandbox/tribometer/step_2-apply_current,apply_current.res@rice:/home/mgesing/Documents/sandbox/tribometer/step_2-apply_current,apply_current.sel@rice:/home/mgesing/Documents/sandbox/tribometer/step_2-apply_current,apply_current.size@rice:/home/mgesing/Documents/sandbox/tribometer/step_2-apply_current,apply_current.sta@rice:/home/mgesing/Documents/sandbox/tribometer/step_2-apply_current,apply_current.stt@rice:/home/mgesing/Documents/sandbox/tribometer/step_2-apply_current,apply_current.sim@rice:/home/mgesing/Documents/sandbox/tribometer/step_2-apply_current,apply_current.use@rice:/home/mgesing/Documents/sandbox/tribometer/step_2-apply_current"
#PBS -W depend=afterok:5671
date
abaqus python apply_current.com
date