Jobs take longer when run in array

Hello community!
I have a problem when running my jobs in array since some subjobs submitted as single jobs are completed in a shorter time. For instance, there is one job, which submitted as a single job takes 43 seconds:

    resources_used.cpupercent = 298
    resources_used.cput = 00:11:04
    resources_used.mem = 1767396kb
    resources_used.ncpus = 1
    resources_used.vmem = 11767892kb
    resources_used.walltime = 00:00:43
    job_state = F

But, when submitted in array with 9 other subjobs, takes over 18 minutes:

    resources_used.cpupercent = 624
    resources_used.cput = 01:39:57
    resources_used.mem = 1732828kb
    resources_used.ncpus = 1
    resources_used.vmem = 11766056kb
    resources_used.walltime = 00:18:02
    job_state = F

When I shorten the array to 5 subjobs, then it is:

    resources_used.cpupercent = 144
    resources_used.cput = 00:18:07
    resources_used.mem = 1691420kb
    resources_used.ncpus = 1
    resources_used.vmem = 11720892kb
    resources_used.walltime = 00:13:03
    job_state = F

Could this be caused by some admin PBS settings?

Did you get chance or did you try to run the same batch command line

  • without using PBS Pro
  • multiple subjob command line without using PBS Pro

To make sure without using PBS Pro the results are the same or they vary.

It looks like your program is using more than one CPU (multi-threaded). In your first example, you used 00:11:04 of CPU time in 00:00:43 walltime, which comes out to about 16 CPUs worth.

However, you requested only 1 ncpus. The result is that PBS overcommitted the CPUs when starting multiple array subjobs at once. See if you get the expected results by setting ncpus=16 on your qsub request.

Hi Adarsh, dtalcott,
I made tests without using PBS.
Single run of the job in question took 34 sec.
When running 9 other subjobs in parallel, it takes 2 min 42 sec.
When running 5 other subjobs in parallel, it takes 2 min 33 sec.

dtalcott, yes, you must be right. I did not mention, that I made one 10-subjob array test requesting 20cpus and the job in question took about one 1 min.
Using 16cpus:

    resources_used.cpupercent = 232
    resources_used.cput = 00:11:29
    resources_used.mem = 1668180kb
    resources_used.ncpus = 16
    resources_used.vmem = 11606204kb
    resources_used.walltime = 00:01:19
    job_state = F

The program I run is a Matlab application running with RTE in singularity container.
So, the call is:

$ singularity run container params ...

But still, I don’t get this overcommiting, as you said.

The result is that PBS overcommitted the CPUs when starting multiple array subjobs at once.

Could this be, that PBS took as many cpus as needed when running single job (ignoring my 1 ncpus request) and in result, in 43 seconds walltime the CPU time was 11min 04 sec?
But, when running array job, PBS by itself did not do this allocation?

Not exactly. Unless you are running on a cpuset host or using the cgroups hook, PBS does not restrict which CPUs your job can use. It assumes you are telling the truth with ncpus. So, when your job says it needs just one CPU, PBS figures it can run as many array subjobs at once as you have CPUs. The jobs actually need more than one CPU, so the operating system has to share the CPUs among the jobs, slowing everything down.

Note: You can have PBS check for excessive CPU use while the job is running with the $enforce cpuaverage MoM configuration option. But, that just kills the job, wasting work done so far.

Also, the timing numbers still don’t come out right. It looks like, even with ncpus=16, your jobs are trying to use more CPUs than requested. Could you try something like the following on your qsub:

-l select=1:ncpus=16:mpiprocs=1:ompthreads=16

After applying these options I get:

    resources_used.cpupercent = 235
    resources_used.cput = 00:11:23
    resources_used.mem = 1321052kb
    resources_used.ncpus = 16
    resources_used.vmem = 11832264kb
    resources_used.walltime = 00:01:19
    job_state = F
    ...
    Resource_List.mpiprocs = 1
    Resource_List.ncpus = 16
    Resource_List.nodect = 1
    Resource_List.place = free
    Resource_List.select = 1:ncpus=16:mpiprocs=1:ompthreads=16

And there is other subjob which uses even more cput:

    resources_used.cpupercent = 321
    resources_used.cput = 00:16:30
    resources_used.mem = 1600400kb
    resources_used.ncpus = 16
    resources_used.vmem = 11671992kb
    resources_used.walltime = 00:01:38
    job_state = F

It is running at quite good speed now, but we must do our Matlab software profiling to know what resources we need.