How can I improve the time

When I run a program, I find that the time taken for distributed computing on multiple nodes with 6 CPUs is longer than that taken on a single node with 4 CPUs.
How can I improve the time?

  • network speed/bandwidth between the compute nodes
  • i/o using SSD’s or NVMe’s
  • using profiling tools to find out the bottlenecks within your application
  • check whether application is scalable (SMP or MPP), some application(s) has limitation after certain point the performance has no gain (even if you use more nodes (cores) )

Thanks for your advice.