It is recommended to use
- static IP and hostname
- configure host based passwordless ssh configuration
- /etc/hosts has all the information of all the nodes and hostnames and it is the same across the headnode and compute nodes
- firewall opened to allow ports 15001 to 15007 across the PBS cluster
- selinux disabled and system rebooted
Please refer this guide:
It is due to passwordless-scp for the user is not working for that user . Hence , stageout (result fles copy from compute node to pbs server node) is failing, hence job is in E state.
You can check the mom logs of the compute node on which the job ran
source /etc/pbs.conf
cd $PBS_HOME/mom_logs/YYYYMMDD # you will see the reason for E state, might be failing to copy back.
Hope this helps