How to quantify HPC usage (spare capacity?)

As per my understanding spare capacity relates to free resources within the hpc that can be given to some department or group of people or a project for period of time .

If you have long wait times and many jobs in the queue ( most of the time) then you do not have spare capacity. You would need to add additional resources (compute nodes, gpu nodes, disk space , networking etc) to suffice the demand of the jobs or bring down the wait time of the queued jobs or finish project on time .

What are the metrics we need to record?
You need to find out

  1. used vs requested cores / memory / walltime / disk space
  2. cores used and unused per node on a daily basis
  3. whether you need all the resource of the hpc or you can turn them off to save energy
  4. Other data that would lead to , whether you have spare capacity or you are under resourced hpc

Ref:
https://www.top500.org/news/how-to-measure-hpc/
https://nci.org.au/our-systems/status

1 Like