As per my understanding spare capacity relates to free resources within the hpc that can be given to some department or group of people or a project for period of time .
If you have long wait times and many jobs in the queue ( most of the time) then you do not have spare capacity. You would need to add additional resources (compute nodes, gpu nodes, disk space , networking etc) to suffice the demand of the jobs or bring down the wait time of the queued jobs or finish project on time .
What are the metrics we need to record?
You need to find out
- used vs requested cores / memory / walltime / disk space
- cores used and unused per node on a daily basis
- whether you need all the resource of the hpc or you can turn them off to save energy
- Other data that would lead to , whether you have spare capacity or you are under resourced hpc
Ref:
https://www.top500.org/news/how-to-measure-hpc/
https://nci.org.au/our-systems/status