I can’t unset the resources_available.ngpus of the node e006, it always be the 1/1.
When it had no job in pbs, I can’t delete resource ngpus either. This is difficult。How can I fix my pbs?
The vnode state is not correct and hence job is in the queue
Also you cannot delete the resource ngpus, because some jobs might have requesetd that ngpus and also jobs stored in the history might have this resource ngpus requested
so you need to delete all the jobs that have requested ngpus (queued and running)
delete jobs from the history that have requested ngpus or you can reset the job history
Thanks for your reply, Mr.adarsh. All jobs in pbs have been deleted, I use qstat -a to check, it outputs nothing. It should be occupied somewhere, but I can’t find it out. I can’t delete the node or the resource.
qstat -x : all jobs
qstat -H : only jobs that have completed, moved, deleted
qstat -xH : includes both of the above
qstat -fx | grep -e Job.ID -e ngpus
To view nodes
pbsnodes -av
pbsnodes -aSjv
Yes, if you have vnode configuration ( configv2 or via cgroups)
you would need to delete the vnodes first [ cnode[0] , cnode[1]) and then the natural cnode cnode