I have a question, if it’s within the scope of the forum, could you kindly help me?
In the cluster that I use there is a queue with 12 cores. To optimize my tasks I would like to run 12 instances of a software each using 1 of the 12 cores. It is possible? How could I do this?
I thought of the job below, but running the software (psi4, a software for quantum chemistry calculations) is quite inefficient.
When I run the psi4 software on 1 core on my personal computer, a standard input takes 10 min. However, when I run 12 inputs with one core each in the cluster (as in the first script example on the PBS system), each input takes 1h. In the last case, I would expect to have run 12 inputs in 10 min in the cluster.
I may be talking something nonsense, but I don’t think each of the 12 entries is running on a separate core in the PBS queuing system (in the case of the first script example on the PBS system).
What I’m looking for is whether it’s possible to allocate each of the 12 software calls to a separate core by submitting a single PBS script for 12 cores.
Please kindly let me know if I need to explain more about.
It is possible to “pin” processes to a specific CPU using cgroups. However, I don’t think that’s the problem here. We need to find out more about the psi4 application. Is the application multithreaded? In other words, when it starts up it looks at how many CPUs are on the system and launches that number of threads to perform the processing in parallel. If that’s the case, twelve instances of the application running at the same time would simply overwhelm the CPUs and lead to run times you are seeing.
It’s also possible that the raw CPU power is not the scarce resource here. Other factors like memory bandwidth, I/O bandwidth, virtual memory swapping, network bandwidth, network file systems, etc. might lead to the results you’re seeing. Try examining the system performance with two instances of the application running on your PC to see which resources are being most heavily utilized. It may be that your application is not suited to running more than one (or at most a few) instances on any given system. Also check to see if there are parameters you can supply to the application that impact how it runs.
After you get the inefficiency issue resolved, and if it still makes sense to run multiple tasks at once, take a look at the GNU parallel program to see if it can simplify your work: