When I run a program, I find that the time taken for distributed computing on multiple nodes with 6 CPUs is longer than that taken on a single node with 4 CPUs.
How can I improve the time?
- network speed/bandwidth between the compute nodes
- i/o using SSD’s or NVMe’s
- using profiling tools to find out the bottlenecks within your application
- check whether application is scalable (SMP or MPP), some application(s) has limitation after certain point the performance has no gain (even if you use more nodes (cores) )
Thanks for your advice.