Run multiple serial tasks in one job

Dear friends,

I’m very noob at using the PBS queuing system.

I have a question, if it’s within the scope of the forum, could you kindly help me?

In the cluster that I use there is a queue with 12 cores. To optimize my tasks I would like to run 12 instances of a software each using 1 of the 12 cores. It is possible? How could I do this?

I thought of the job below, but running the software (psi4, a software for quantum chemistry calculations) is quite inefficient.

#!/bin/bash
#PBS -N job
#PBS -e job.err
#PBS -o job.out
#PBS -q par12
#PBS -l nodes=1:ppn=12

cd $PBS_O_WORKDIR

# load libraries for psi4 software

(psi4 -i input_1.dat -o output_1.out -n 1) &
(psi4 -i input_2.dat -o output_2.out -n 1) &
(psi4 -i input_3.dat -o output_3.out -n 1) &
(psi4 -i input_4.dat -o output_4.out -n 1) &
(psi4 -i input_5.dat -o output_5.out -n 1) &
(psi4 -i input_6.dat -o output_6.out -n 1) &
(psi4 -i input_7.dat -o output_7.out -n 1) &
(psi4 -i input_8.dat -o output_8.out -n 1) &
(psi4 -i input_9.dat -o output_9.out -n 1) &
(psi4 -i input_10.dat -o output_10.out -n 1) &
(psi4 -i input_11.dat -o output_11.out -n 1) &
(psi4 -i input_12.dat -o output_12.out -n 1) &

wait

Thanks in advance for your time.

Create 12 scripts like below with respective indexes of input and output file and submit the job(s) using qsub.

#!/bin/bash
#PBS -N job
#PBS -e job.err
#PBS -o job.out
#PBS -q par12
#PBS -l select=1:ncpus=1

cd $PBS_O_WORKDIR
# load libraries for psi4 software
(psi4 -i input_1.dat -o output_1.out -n 1) &

Or else you can run the job like in a job array , using the array_index as suffixes of the input and output files.

Thanks for the comment. But that’s not what I’m looking for.

In the cluster I use, there is no serial queue (with 1 core). The minimum number of cores I can select is 12.

So, to make better use of the 12 cores, I would like to run 12 separate software calls in a single job (single pbs script), each one using a core.

Please explain what you mean by “inefficient”. What prevents the processes from running efficiently when they are running in parallel?

Thanks for your reply.

When I run the psi4 software on 1 core on my personal computer, a standard input takes 10 min. However, when I run 12 inputs with one core each in the cluster (as in the first script example on the PBS system), each input takes 1h. In the last case, I would expect to have run 12 inputs in 10 min in the cluster.

I may be talking something nonsense, but I don’t think each of the 12 entries is running on a separate core in the PBS queuing system (in the case of the first script example on the PBS system).

What I’m looking for is whether it’s possible to allocate each of the 12 software calls to a separate core by submitting a single PBS script for 12 cores.

Please kindly let me know if I need to explain more about.

It is possible to “pin” processes to a specific CPU using cgroups. However, I don’t think that’s the problem here. We need to find out more about the psi4 application. Is the application multithreaded? In other words, when it starts up it looks at how many CPUs are on the system and launches that number of threads to perform the processing in parallel. If that’s the case, twelve instances of the application running at the same time would simply overwhelm the CPUs and lead to run times you are seeing.

It’s also possible that the raw CPU power is not the scarce resource here. Other factors like memory bandwidth, I/O bandwidth, virtual memory swapping, network bandwidth, network file systems, etc. might lead to the results you’re seeing. Try examining the system performance with two instances of the application running on your PC to see which resources are being most heavily utilized. It may be that your application is not suited to running more than one (or at most a few) instances on any given system. Also check to see if there are parameters you can supply to the application that impact how it runs.

After you get the inefficiency issue resolved, and if it still makes sense to run multiple tasks at once, take a look at the GNU parallel program to see if it can simplify your work:

https://www.gnu.org/software/parallel/

Please check whether this setting help (if its related to your issue)
https://psicode.org/psi4manual/master/api/psi4.core.set_num_threads.html

1 Like

It wasn’t really what I was looking for but it did the job @adarsh.

The software calculations started to run faster in the cluster with this tip!

Many thanks,

Best regards!

Many thanks for the comments @adarsh, @mkaro, and @dtalcott.

@adarsh’s last comment wasn’t necessarily what I was looking for, but there have been significant improvements in software performance.

Best regards!

1 Like