Warewulf cluster connection to OpenPBS failure

To understand the above correctly

  • The compute nodes are booted up (PXE boot)
  • pbs_mom service is running / if this service is not up and running, you manually start the service
    [ or you can make sure pbs_mom services is started at the last once all the system servces or all up and running ]
  • pbsnodes -av # shows all nodes are connected to the PBS Server
  • now job is submitted [ whether this job runs fine or does this job fails ]

Could you please share the job script or did you try running simple test job like below

qsub -- /bin/hostname
qsub -- /bin/sleep 10

Make sure on the compute nodes /var does not have any permissions set that is affecting this.
In the above screenshot system copy ( cp or scp based on the configuration) failed, as the file did not exist.

Please check the ports are not blocked PBS Implementation on AWS - #4 by adarsh