Hi all:
I am using openpbs v20.0.1 on our computing cluster, and I am also a heavy user of molecular dynamics software, such as GROMACS, Lammps. Recently, I noticed that the job performace is much lower when submitted through pbs, compared with directly running with on the same computing node through ssh login.
The perfomance of lammps through PBS is: 5.363 ns/day, 4.475 hours/ns, 62.068 timesteps/s, 91.0% CPU use with 24 MPI tasks x 1 OpenMP threads
The Device Time Information through PBS is:
Data Transfer: 7.1626 s.
Neighbor copy: 0.0257 s.
Neighbor build: 0.0926 s.
Force calc: 10.1331 s.
Device Overhead: 2.5805 s.
Average split: 1.0000.
Lanes / atom: 4.
Vector width: 32.
Max Mem / Proc: 33.78 MB.
CPU Neighbor: 0.3908 s.
CPU Cast/Pack: 24.4776 s.
CPU Driver_Time: 0.1674 s.
CPU Idle_Time: 3.9796 s.
At the meantime, when running the same job with the same resource configuration on the same computing node through ssh login without PBS, the performance is: 11.133 ns/day, 2.156 hours/ns, 128.850 timesteps/s 86.2% CPU use with 24 MPI tasks x 1 OpenMP threads
The Device Time Information through ssh is:
Data Transfer: 69.7440 s.
Neighbor copy: 0.0488 s.
Neighbor build: 0.4698 s.
Force calc: 67.2331 s.
Device Overhead: 17.6636 s.
Average split: 1.0000.
Lanes / atom: 4.
Vector width: 32.
Max Mem / Proc: 31.92 MB.
CPU Neighbor: 0.6567 s.
CPU Cast/Pack: 42.5587 s.
CPU Driver_Time: 0.4871 s.
CPU Idle_Time: 43.5711 s.
It is evident that the performance of directly running without PBS is twice faster than that with PBS. I have no idea how this happens. Could you give some advices?
Sincerely Pan