I’m getting into a lot of errors when trying to run my application (a parallel app based on OpenMP and MPI parallelism) on a server with hyper-threading using PBS.
I get two kinds of errors:
- The job is queued (i.e. in Q status) even when there are free cores on the server. This happens mainly when I get to the last 200 cores or so in the server which has a total of 672 cores with hyperthreading, 336 physical cores).
- The job starts to run but fails with the following error:
OMP: Error #34 System unable to allocate necessary resources for OMP thread:
OMP: System Error #11: Resource temporarily unavailable
OMP: Hint: Try decreasing the value of OMP_NUM_THERADS.
This happens sometimes no matter how many cores I assign to the job where OMP_NUM_THERADS is usually equal to ‘3,28’, that is a total of 84 threads required for the job.
Any help in discovering why this is happening is very much appreciated.