Plenty of free nodes - yet Jobs are being held

I have a small system. Ten nodes, each with 64 cores. The output of “pbsnodes -a” shows a number of free nodes, yet I have a lot of jobs in the “Q” and “H” state.

I can log into the nodes and have restarted PBS on what we call the head node, the login nodes, and the compute nodes. Yet as I mentioned PBS is not sending jobs to these idle nodes.
I would like some trouble shooting tips to make sure my system is fully functional. Thank you.

Does pbsnodes -l show the nodes as offline?

What does tracejob <job number> on a queued job say?

Please check these:

  1. user home directory is not available on the compute nodes
  2. user password file is missing
  3. user authentication on the compute node

Please check : whether user can login into compute nodes via ssh and their home directory is visible.