It looks like the job ran, but failed to return the output files. Ensure the user is able to ssh/scp between the submission host and the execution host.
In the filter on the left select PBS Professional as the product and 2020.1 as the version. If you are using shared filesystems, check out the $usecp directive in the Administration Guide.
Check out section 14.6 in the Administration Guide for details on file transfer.
I rechecked ssh. ssh-keygens are copied and there is no issue in that. i removed the pbs complety and reinstalled 22 version.
But still same issue. I use NFS to share the /home directory from headnode to compute nodes.
Manually I am able to touch / write any file from any node. I checked it. So NFS does not seem to have any issue.
So basically its local copy for MOM. Below is my /etc/pbs.conf
The $usecp directive in the mom configuration file may be used to indicate which filesystems are shared. For example, if /home/user is mounted via NFS then you may use $usecp to indicate this. In that case, cp will be used instead of scp to copy the output file. There is also a qsub parameter (-k oe) that you may use to tell PBS to just leave the output where it is without attempting to copy back to the submission host. Please verify:
/usr/bin/scp exists on the execution hosts
You are able to scp a file from the execution host back to the submission host without a password
It may be that the first time you attempt to use scp from the execution host it is prompting whether to accept the finger print of the remote host. After that, it doesn’t prompt you anymore. It’s possible to use the -o StrictHostKeyChecking=no parameter on ssh/scp to disable strict host key checking and avoid the prompt.
It seems when it is trying to stageout it is unable to preserve the permissions of the files being copied or the user does not have enough permission to copy back to that destination.
Can you please add the below directive and test again. #PBS –W suppress_email=-1
I hope not, it is a copy paste special character insert, did you get a chance to try typing the hypen (-) instead of copy paste, if you have done that, then that feature is not exposed/added to the version of openpbs you are using.
Hi Adarsh,
Sorry it was the copy paste issue. I manually typed it. Anyways.
@mkaro and @adarsh .very much thankful for your support. I tried to run mpi jobs manually and it was openmpi library issue.
after creating soft links for the libraries not found
ln -s /usr/lib/x86_64-linux-gnu/libmpi_cxx.so /usr/lib/x86_64-linux-gnu/libmpi_cxx.so.1
ln -s /usr/lib/x86_64-linux-gnu/libmpi.so /usr/lib/x86_64-linux-gnu/libmpi.so.12
on all nodes.
openmpi started working. it was not the pbs issue. Thanks a lot guys.