Creating the user accounts on Linux that can execute jobs on the cluster

It is recommended to use

  • static IP and hostname
  • configure host based passwordless ssh configuration
  • /etc/hosts has all the information of all the nodes and hostnames and it is the same across the headnode and compute nodes
  • firewall opened to allow ports 15001 to 15007 across the PBS cluster
  • selinux disabled and system rebooted

Please refer this guide:

It is due to passwordless-scp for the user is not working for that user . Hence , stageout (result fles copy from compute node to pbs server node) is failing, hence job is in E state.
You can check the mom logs of the compute node on which the job ran
source /etc/pbs.conf
cd $PBS_HOME/mom_logs/YYYYMMDD # you will see the reason for E state, might be failing to copy back.

Hope this helps

1 Like