Stagein/stageout misinterpreting path: a slash is deleted

Hello,
I am trying to run a job with the following path for stagein/stageout:
#PBS -W stagein=“.@xx.xx.xx.1:/srv/dev_workspace/test_pbs/*”
#PBS -W stageout=“*@xx.xx.xx.1:/srv/dev_workspace/test_pbs/”

In /srv, 2 cifs storages are mounted:
/srv/dev_workspace/
/srv/dev_storage/

when using /srv/dev_storage/ the jobs work fine, the stagin/stageout are both successful:
#PBS -W staginin=‘.@xx.xx.xx.1:/srv/dev_storage/Public/BDE/PBS/test4/*
#PBS -W stagout=‘*@xx.xx.xx.1:/srv/dev_storage/Public/BDE/PBS/test4/’

when using

#PBS -W stagein=”.@xx.xx.xx.1:/srv/dev_workspace/test_pbs/*”
#PBS -W stageout=“*@xx.xx.xx.1:/srv/dev_workspace/test_pbs/”

The stagein fails, in the mom_log:

‘/srv/dev_workspacetest_pbs/FeaJob.py.o31’: No such file or directory

PBS is stiching “dev_workspace” and “test” for some reason
I found a way around using a double slash //

#PBS -W stagein=“.@xx.xx.xx.1:/srv/dev_workspace//test_pbs/*"
#PBS -W stageout="*@xx.xx.xx.1:/srv/dev_workspace//test_pbs/”

The stagein is successful, the jobs run properly but is unable to stageout:

…;34.ip-123456;copy file request received
…;sys_copy;command: /usr/bin/pbs_cp -rp 34.ip-123456.OU /srv/dev_workspacetest_pbs/FeaJob.py.o34 status=1, try=1
…;sys_copy;command: /usr/bin/pbs_cp -rp 34.ip-123456.OU /srv/dev_workspacetest_pbs/FeaJob.py.o34 status=1, try=2
…;sys_copy;command: /usr/bin/pbs_cp -rp 34.ip-123456.OU /srv/dev_workspacetest_pbs/FeaJob.py.o34 status=1, try=3
…;sys_copy;command: /usr/bin/pbs_cp -rp 34.ip-123456.OU /srv/dev_workspacetest_pbs/FeaJob.py.o34 status=1, try=4
…;copy_file;Job 34.ip-123456: sys_copy failed, return value=1
…;34.ip-123456.OU;Unable to copy file 34.ip-123456.OU to ip-0a582e85:/srv/dev_workspace/test_pbs/FeaJob.py.o34
…;34.ip-123456.OU;/usr/bin/cp: cannot create regular file ‘/srv/dev_workspacetest_pbs/FeaJob.py.o34’: No such file or directory
…;stage_file;Job 34.ip-123456: no wildcards:local stageout failed for saf112092 from 34.ip-123456.OU to ip-123455:/srv/dev_workspace/test_pbs/FeaJob.py.o34
…;34.ip-123456;Job files not copied:---->>>>
…;34.ip-123456;Unable to copy file 34.ip-123456.OU to ip-0a582e85:/srv/dev_workspace/test_pbs/FeaJob.py.o34
…;34.ip-123456;>>> error from copy
…;34.ip-123456;/usr/bin/cp: cannot create regular file ‘/srv/dev_workspacetest_pbs/FeaJob.py.o34’: No such file or directory
…;34.ip-123456;>>> end error output

again I get the same problem even with double slash:
/usr/bin/cp: cannot create regular file ‘/srv/dev_workspacetest_pbs/FeaJob.py.o34’: No such file or directory

I also tried with quotes inside quotes like explained in the userr guide 3.2.5.1

#PBS -W stagein=“.@xx.xx.xx.1:/srv/dev_workspace/test_pbs/
#PBS -W stageout=“*@xx.xx.xx.1:/srv/dev_workspace/test_pbs/

This time, stagein works fine with one slash but I still get the same error with a path where one slash has been removed.

I tried to copy manually the file successfully to check the user permissions. It works.

Could you advise ? The behavior is strange because “/” is not an escape character like “\”. All the system works under Linux.

Regards
Bert

It might be multiple @ is causing the issue.
FYI:

#PBS -W stagein = <execution_path>@:<storage_path>
#PBS -W stageout = <execution_path>@:<storage_path>

stagein: location of input files ( copy input files to the execution directory or job directory)
stageout: location of output files (copy results from job directory or execution directory back to you intended location )
execution_path: execution directory on the compute node
storage_path: filename on host hostname
The ‘@’ character separates execution path specification from storage path specification

@ character is just a separator, it does not represent username@hostname kind of specification

Thanks for your answer.

There is no multiple ‘@’. When saving my first post, some wrong strings were introduced. I cleaned my first Post so it should be more understandable.

Indeed, I respect the formating like in UG-33 section 3.2.

To be more synthetic, I write the stageout like this:
#PBS -W stageout@xx.xx.xx.1“*@xx.xx.xx.1:/srv/dev_workspace/test_pbs/

and I get the error message where we can conclude that PBS is misinterpreting the stageout (one / is missing):
usr/bin/cp: cannot create regular file ‘/srv/dev_workspacetest_pbs/FeaJob.py.o34’: No such file or directory

Is $usecp configured in the mom_priv/config file ? and please share the contents of it if you can

  • /srv/dev_workspace and /srv/dev_storage are mounted on all the compute nodes ?

would it be possible to disable it and restart the pbs services on the compute node ?
Try again, then pbs will switch to use SCP

/etc/pbs.conf should have these lines in the same order

PBS_RCP=/bin/false # so that it does not fall back to rcp
PBS_SCP=/usr/bin/scp # or what shows in which scp
PBS_RSHCOMMAND=/bin/ssh

Hi Adarsh,

I found the mistake thanks to your advice. It was comming from the mom_priv/config file:

$usecp *:/srv/dev_storage/ /srv/dev_storage/
$jobdir_root /mnt/PBS/home/spool
$usecp *:/srv/jupyter/ /srv/jupyter/
$usecp *:/srv/dev_workspace/ /srv/dev_workspace <-- Missing / at the end
$restrict_user_maxsysid 999

I forgot one slash. This solves my issue:

$usecp *:/srv/dev_storage/ /srv/dev_storage/
$jobdir_root /mnt/PBS/home/spool
$usecp *:/srv/jupyter/ /srv/jupyter/
$usecp *:/srv/dev_workspace/ /srv/dev_workspace/  <-- Adding / at the end
$restrict_user_maxsysid 999

Everything works now.

Thank you!

1 Like

Nice one ! Thank you Bert.