Job stuck in queue, multiple servers

Greetings,

I have installed openpbs on a workstation (a single machine run everything, it is possible to remotely access through ssh).
I managed to send jobs just fine until a couple of months ago. I haven’t used it for a while and now that I’m back the job I sent are stuck in queue. There are no other jobs running. If I delete the jobs no error files are created. I checked logs and everything seems fine.

I noted this detail though, when I input pbsnodes -av I have two identical servers with different status (stale and free)

sysadmin@Precision-7920-Tower:~/testVASP/newtest$ pbsnodes -av
precision-7920-tower
Mom = precision-7920-tower
ntype = PBS
state = Stale
pcpus = 20
resources_available.arch = linux
resources_available.host = precision-7920-tower
resources_available.mem = 97495476kb
resources_available.ncpus = 20
resources_available.vnode = precision-7920-tower
resources_assigned.accelerator_memory = 0kb
resources_assigned.hbmem = 0kb
resources_assigned.mem = 0kb
resources_assigned.naccelerators = 0
resources_assigned.ncpus = 0
resources_assigned.vmem = 0kb
resv_enable = True
sharing = default_shared
license = l
last_state_change_time = Wed Jul 20 14:45:18 2022
last_used_time = Tue Jan 18 22:15:56 2022

Precision-7920-Tower
Mom = precision-7920-tower
ntype = PBS
state = free
pcpus = 20
resources_available.arch = linux
resources_available.host = precision-7920-tower
resources_available.mem = 97495476kb
resources_available.ncpus = 20
resources_available.vnode = Precision-7920-Tower
resources_assigned.accelerator_memory = 0kb
resources_assigned.hbmem = 0kb
resources_assigned.mem = 0kb
resources_assigned.naccelerators = 0
resources_assigned.ncpus = 0
resources_assigned.vmem = 0kb
resv_enable = True
sharing = default_shared
license = l
last_state_change_time = Wed Jul 20 14:45:18 2022
last_used_time = Wed Jul 20 13:16:42 2022

I wonder if this could be the issue.

Thanks!

Update: I had removed the nodes and recreated with qmgr. Now the pbdsnodes -av gives me a single results with a free state:

sysadmin@Precision-7920-Tower:~/testVASP/newtest$ pbsnodes -av
Precision-7920-Tower
Mom = precision-7920-tower
ntype = PBS
state = free
pcpus = 20
resources_available.arch = linux
resources_available.host = precision-7920-tower
resources_available.mem = 97495476kb
resources_available.ncpus = 20
resources_available.vnode = Precision-7920-Tower
resources_assigned.accelerator_memory = 0kb
resources_assigned.hbmem = 0kb
resources_assigned.mem = 0kb
resources_assigned.naccelerators = 0
resources_assigned.ncpus = 0
resources_assigned.vmem = 0kb
resv_enable = True
sharing = default_shared
license = l
last_state_change_time = Wed Jul 20 15:38:04 2022

This still does not solve the issue with the job in queue. I will wait for answers.

Thanks!

Please share us the output of the below commands:

  • qstat -answ1
  • qstat -fx
  • qstat -Bf