Cannot run job on multiple nodes

Hi,

I have set up a test setup. When i run a job on 1 node it is running. but when i run a job on 2 nodes, it is not running.

Could someone help, if I need to configure anything to make job run on multiple node?

Please let me know what all details i need to share.

Hi ,

  1. PBS Mom (execution component) should be deployed on both the nodes
  2. Passwordless-ssh should be configured for all the users (StrictHostkeychecking should be “no” in the /etc/ssh/ssh_config on systems in the PBS Cluster)
    • headnode to compute node(s)
    • compute node(s) to headnode
    • compute node to compute node(s)
  3. There should be a common share where users can access from all the systems in the PBS Cluster
    For eg: export /workingdirectory from the headnode and mount it across all the compute nodes, makes sure this /workingdirectory is readable writeable by all the users from across all the compute nodes.
  4. Make sure you have a application or the script you are trying to run supports MPI ( Intel, OpenMPI or others ) and is located in the /workingdirectory (along with input files), submit the job script as below qsub -l select=2:ncpus=2:mpiprocs=2 -l place=scatter .

Please share more information on the configuration of your setup and the application or job script that you are trying to run and issues / logs you have come across.

Hi Adarsh,

Thank you for the support.

My issue was, I did not have enough resources. I identified when I ran qstat -f.

Could you share the procedure for setting up passwordless ssh between nodes.

I usually add the keys between nodes using ssh-copyid user@node01. But this would be tiresome for huge clusters. I am creating the users on nfs shared location which is accessible by all nodes. Is there any simpler way?

Sincerely,

  • *Aniesh
    Aniesh Mathew

Thank you Aniesh,

Better to configure hostbased-passwordless SSH , details can be found in the below link
https://en.wikibooks.org/wiki/OpenSSH/Cookbook/Host-based_Authentication
In this case, you do not have to create user based keys, you need to create keys for the hosts.

If you have a common home directory for the users across the cluster, then the it would be easy to maintain passwordless-ssh access.

Hi Adarsh,

Since I had the home directory on nfs shared with execution nodes, passwordless login issue is being taken care.

have installed in on Centos 7.2. Can we use the community edition on RHEL7.4 using centos latest package?

Yes, you can use it on RHEL 7.4 , or you can compile it from source on RHEL 7.4 and deploy.
It should work without any issues.

Thank you