This worked a couple of days ago but now I only get 4 lines of output rather than 16. The other guy I work with says there were no changes, but something is different.
Now, though 16 ncpus are set aside (the tracejob output confirms it) OMP seems “mis-set”
What dumb thing could I be missing or where can I look? This first one is also the only one to use 4 nodes even when in other examples select=4 is set.
I used two different script to verify, openmpi4 with python and mpi4pi and a dirtt simple c++ proggie I wrote a while ago to test the same thing.
They are called within the job script like this when I use them
mpiexec --mca btl ‘^openib’ python python-mpi-hello.py
mpiexec --mca btl ‘^openib’ mpi-hello
Server default is scatter.
Directives
#PBS -l nodes=4:ncpus=4
#PBS -l mem=10gb
Env Value: OMP_NUM_THREADS = 4 (16 CPUS set aside)
tracejob
hpc-compute01:ncpus=4:mem=2621440kb
hpc-compute02:ncpus=4:mem=2621440kb
hpc-compute03:ncpus=4:mem=2621440kb
hpc-compute04:ncpus=4:mem=2621440kb
Job Output
process rank 0, of 4, running on hpc-compute01
process rank 2, of 4, running on hpc-compute03
process rank 1, of 4, running on hpc-compute02
process rank 3, of 4, running on hpc-compute04
For giggles, I tried other select statements and tests as well, and in each case, 16 CPUS are set aide by PBS but OMP_NUM_THREADS is off.
except in one case where it was right, but in that case only 1 was used (ugg). I avoided cluttering the post with the output hoping this might be enough to point me in the right direction.
Thanks