Hello all, new PBS user here.
I have an application that makes use of multi-threading. If I run the application, the process will create multiple threads and consume multiple cores. The application also utilizes OpenMPI so that we can run multiple instances of the application on separate machines in order to increase the application’s throughput. We would like to be able to use PBS to schedule these jobs. Typically, each process will consume 12 cores and we run one process per node. However, if a machine has a large number of cores, we could run multiple processes on one node.
As is, I can submit a job to PBS and PBS will limit the number of running processes to the number of cores available on each my nodes. This works fine if a process is single-threaded. However, in my case, because my application is multi-threaded, this creates excessive CPU contention.
Is there any way to tell PBS (via qsub or some other mechanism) that each executing process consumes a certain number of cores and have PBS limit the number of processes that can run on a node accordingly? So if my process uses 12 threads and a node has 24 cores, a maximum of 2 processes would run on that node.
If this is not possible, is there a way to limit each node to only a single process? So if I have 8 nodes, I could submit 2 jobs that each use 4 nodes and have them distributed between the nodes. If I were to submit another 4-node job, that job would be queued until 4 nodes became available.
Thanks