Hello,
This might be a feature improvement for PBS Pro worth exploring. So here is my discovery: when a reservation is made, it does not manifest itself on pbsnodes -avSj as regular jobs would even though it’s holding the particular nodes.
pbsnodes -avSj | grep node006
node0069 state-unknown 0 0 0 0kb/0kb 1/1 0/0 0/0 –
node0068 state-unknown 0 0 0 0kb/0kb 1/1 0/0 0/0 –
node0067 state-unknown 0 0 0 0kb/0kb 1/1 0/0 0/0 –
node0066 state-unknown 0 0 0 0kb/0kb 1/1 0/0 0/0 –
node0065 state-unknown 0 0 0 0kb/0kb 1/1 0/0 0/0 –
node0064 free 1 1 0 5gb/63gb 12/32 0/0 8/8 21983
node0063 free 0 0 0 1tb/1tb 12/32 0/0 0/0 –
node0062 offline 0 0 0 1tb/1tb 12/32 0/0 0/0 –
node0061 offline 0 0 0 1tb/1tb 12/32 0/0 0/0 –
node0060 free 0 0 0 63gb/63gb 0/20 0/0 0/0 –
node006[1-3] each have 12 cores out of 32 cores being held by a reservation but this is not explicitly listed.
But if you go digging at the node itself, you’ll then find this information:
[root@bright01-thx ~]# pbsnodes node0063
node0063
…
resv = R18667.bright01-thx
resources_available.arch = linux
resources_available.host = node0063
…
I’m not exactly sure if this behavior was due to a crashed job on the node or a bug with PBS Pro, but I thought it would be good to share with the community.
These nodes were likely running a job on this reservation (R18667) but then reservation still held the nodes and it took some investigation to discover this.
Thanks,
Siji