Hi,
I have an issue, which is quite crucial for going further. I’ve tried to install OpenPBS on Warewulf and as I’ve dealt with installation, connected some nodes from cluster to OpenPBS server problem has occured. I cannot send any job to it because of an error showed in attachments.
”
titan-n1
Mom = titan-n1.ece.local
Port = 15002
pbs_version = 23.06.06
ntype = PBS
state = free
pcpus = 32
resources_available.arch = linux
resources_available.host = titan-n1
…
”
pbs_mom service is running / if this service is not up and running, you manually start the service
[ or you can make sure pbs_mom services is started at the last once all the system servces or all up and running ]
pbsnodes -av # shows all nodes are connected to the PBS Server
now job is submitted [ whether this job runs fine or does this job fails ]
Could you please share the job script or did you try running simple test job like below
qsub -- /bin/hostname
qsub -- /bin/sleep 10
Make sure on the compute nodes /var does not have any permissions set that is affecting this.
In the above screenshot system copy ( cp or scp based on the configuration) failed, as the file did not exist.