PBS to stage all data to the node

how to configure PBS to Stage data to the node


You can do this from within the job script

  1. For example:
    #PBS directives
    < main application batch command line >
    < scp or cp or rsync files form the job directory to your source location >
    < cleanup the files in the job directory after the above copy command is successful or keep both >

  2. execjob_epilogue hook which does the scp , cp or rysnc or any remote file copy to the source location and then deletes the files from the sandbox / job directory (or you can keep both )

  3. Write a hook event that alters the STAGEOUT attribute and redirects to the source directory,

Thank you

PBS does have a built-in staging mechanism triggered with the stagein and stageout options to the -W (additional job attributes) directive. If you consult the qsub manpage and search for ‘stagein’, you will find the documentation. In my opinion, the documentation for this feature has always been rather confusing and does not provide an example, so here is an example:

#PBS -W stagein=/path/to/local/dir@remote.host:/path/to/remote/dir,stageout=/path/to/local/dir@remote.host:/path/to/remote/dir

Importantly, and also not mentioned in the manpage, are:

  1. The value of PBS_SCP in /etc/pbs.conf is the command used to perform the copy. I think that the default is to use /bin/scp, however it could be rcp. I make it explicit in pbs.conf. Obviously if using scp, you need to have configured passwordless ssh/scp.
  2. The PBS_SCP command is run on the first node in the job (the mother superior) for both the stagein and stageout. Thus the stagein is a pull done by the compute node and stageout is a push done by the compute node.

If you need more sophisticated staging, then I recommend either writing your own script and making it PBS_SCP, or using the scenarios that @adarsh mentioned.

1 Like

Hopefully this will help.

The file transfer protocol depends what you have configured on the PBS Servers /etc/pbs.conf for STAGEIN (server to compute nodes) and PBS MOMs $PBS_HOME/mom_priv/config for STAGEOUT (if $usecp does not exists in mom_priv/config then it follows what is configured in /etc/pbs.conf on the MOM)

Key words to search in the PBS Pro admin guide : $usecp
Default copy mechanism is : RCP , otherwise SCP and CP if they are configured in pbs.conf / mom_priv/config

#PBS -W stagein = <execution_path>@:<storage_path>
#PBS -W stageout = <execution_path>@:<storage_path>

stagein: location of input files ( copy input files to the execution directory or job directory)
stageout: location of output files (copy results from job directory or execution directory back to you intended location )
execution_path: execution directory on the compute node
storage_path: filename on host hostname
The ‘@’ character separates execution path specification from storage path specification

@ character is just a separator, it does not represent username@hostname kind of specification

#PBS -N pbsproapplicaton
#PBS -l select=1:ncpus=1:mem=1gb
#PBS -W sandbox=PRIVATE
#copy the box.fem file from the current location to headnode:/home/pbsdata/optistruct with the same file name box.fem
#PBS -W stagein=box.fem@headnode:/home/pbsdata/optistruct/box.fem
#copy all the results from the sandbox or jobdir  to  headnode:/home/pbsdata/output
#PBS -W stageout=*@headnode:/home/pbsdata/output

#PBS -N pbsproapplication
#PBS -l select=1:ncpus=1:mem=1gb
#PBS -W sandbox=PRIVATE
#copy the box.fem  and box.inc files from the current location to headnode:/home/pbsdata/optistruct with the same file name box.fem
#PBS -W stagein=box.fem@headnode:/home/pbsdata/optistruct/box.fem,box.inc@@headnode:/home/pbsdata/optistruct/box.fem
#copy  *.out and *.log result files from the sandbox or jobdir  to  headnode:/home/pbsdata/output
#PBS -W stageout=*.out@headnode:/home/pbsdata/output,*.log@headnode:/home/pbsdata/output

Please see the PBS Pro User Guide, section 3.2, “Input/Output File Staging”, page UG-33.