Multiple Job ID's listed

Hi

Could someone explain why there are duplicate Job Id’s listed in this output and why they are counted as jobs under ‘njobs’? TIA

Command used: pbsnodes -aSvj


                                       mem       ncpus   nmics   ngpus
vnode           njobs   run   susp      f/t        f/t     f/t     f/t   jobs
--------------- ------ ----- ------ ------------ ------- ------- ------- -------

Node1                8     8      0    336gb/1tb   15/88     0/0     0/0 1017603.hpc-pbs,1020539.hpc-pbs,1020539.hpc-pbs,1020539.hpc-pbs,1020539.hpc-pbs,1020539.hpc-pbs,1020539.hpc-pbs,1020970[6].hpc-pbs

Node2                3     3      0    300gb/1tb   17/88     0/0     0/0 1011621.hpc-pbs,1020542.hpc-pbs,1021079.hpc-pbs


Node3                9     9      0    158gb/1tb   46/88     0/0     0/0 1021230.hpc-pbs,1019593.hpc-pbs,1019593.hpc-pbs,1019593.hpc-pbs,1019593.hpc-pbs,1019593.hpc-pbs,1019593.hpc-pbs,1021230.hpc-pbs,1021092.hpc-pbs

Is there anything interesting about how the duplicate jobs requested resources or how they ran?

Perhaps a tracejob 1020539 would shed some light?

Also a pbsnodes Node1

in pbsnodes, it displays the job by chunks. e.g, using select=1:ncpus=32 there would be 1 entry, select=32:ncpus=1 there would be 32 entries. select=2:ncpus=16 there would be 2 entries.

When you add the -Sjv option, pbsnodes tries to consolidate entries for the same job into one entry. For some reason, this is not working. I was hoping the exact pbsnodes Node1 output would give a clue as to why.

I noticed one issue. The entries have ‘1020539.hpc-pbs’ instead of just ‘1020539’. This suggests there is some confusion about the default server name.

Thanks.
There’s definitely a disconnect between njobs displayed with ‘pbsnodes -aSjv’ and the number (count R’s per node) captured using ‘qstat -a -n1’. This is ultimately what I need to figure out.