Hi I have new cluster setup with PBS PRO CE 19.1.3. I am running a simple script but the job fails with the errors. What could have gone wrong?
vamshi@master01:~$ pbsnodes -aSj
mem ncpus nmics ngpus
vnode state njobs run susp f/t f/t f/t f/t jobs
#PBS -l select=3:ncpus=1
echo -n "I am on: "
hostname;
echo finding ssh-accessible nodes:
for node in $(cat ${PBS_NODEFILE}) ; do
echo -n "running on: "
/usr/bin/ssh $node hostname
done
test1.sh.o1006:
I am on: node01
finding ssh-accessible nodes:
running on: running on: running on:
Hi,
I am not sure, but its look like You have problem with host-based authentication.
Did You try connect via ssh from command line from one host(e.g. node01) to the other? Did You logon without entering password?
Please edit the /etc/ssh/ssh_config. and set StrictHostKeyChecking to. no. ( please check the correct syntax and caps on that name) . Set the same on the server and compute nodes.
Yes. I have added that line only on master node. Now added on compute nodes too and itβs working. My mistake. Thank you.
One more thing. Here, with this line #PBS -l select=3 I intend to run the job on three nodes. So out put should be something like below:
I am on: node01
finding ssh-accessible nodes:
running on: node01
running on: node02
running on: node03
But itβs always run only one node like
I am on: node01
finding ssh-accessible nodes:
running on: node01
running on: node01
running on: node01
That works. Thank you.
Why does #PBS -l select=3:ncpus=1 alone did not work?
and if I specify ngpus=2, it shows ngpus available =0. How do I configure ngpus?
By default PBS chooses to use either free or pack placement, so that once the node is completely packed then it will pick the next node.
Please check the PBS Professional 2021.1 Userβs Guide on UG-65
Table 4-3: Placement Modifiers
qmgr : create resource ngpus type=long,flag=nh
Add ngpus to the resources: line in the $PBS_HOME/sched_priv/sched_config
kill -HUP
qmgr : set node NODENAME resources_available.ngpus=2
NODENAME = hostname of the node
2 is number of gpus available. on that node
Thank you very much @adarsh@boboshaq
You are really helpful to me. The issues are resolved.
I am glad to have this community. I hope I can contribute back.