How to quantify HPC usage (spare capacity?)

Cathy · May 3, 2023, 8:35am

We are in the fortunate position of a group wanting to give us money for HPC equipment and staff. However, we are being asked how much “spare capacity” we have. I can get my head around disk capacity, but how do you quantify “spare capacity” for a HPC? We have wait times for jobs at busy periods, but other times zero queue lengths. What are the metrics we need to record?

You may ask why I don’t ask our HPC staff ! For six months I have been the “HPC staff” due to “budget reasons” (because “I know Linux”). I desperately want those new staff, and need to look like I know what I am doing. Once those staff are onboard, I am out of there and back to server support.

Please be kind to me!

adarsh · May 3, 2023, 6:14pm

As per my understanding spare capacity relates to free resources within the hpc that can be given to some department or group of people or a project for period of time .

If you have long wait times and many jobs in the queue ( most of the time) then you do not have spare capacity. You would need to add additional resources (compute nodes, gpu nodes, disk space , networking etc) to suffice the demand of the jobs or bring down the wait time of the queued jobs or finish project on time .

What are the metrics we need to record?
You need to find out

used vs requested cores / memory / walltime / disk space
cores used and unused per node on a daily basis
whether you need all the resource of the hpc or you can turn them off to save energy
Other data that would lead to , whether you have spare capacity or you are under resourced hpc

Ref:
https://www.top500.org/news/how-to-measure-hpc/
https://nci.org.au/our-systems/status

Topic		Replies	Views
How to compute hpc cpu core utilization? Users/Site Administrators	1	697	January 10, 2020
Host utilisation Users/Site Administrators	1	1203	April 26, 2018
Queue available slots Users/Site Administrators	2	571	June 23, 2022
CPU and Mem usage info for end user - postscripts? Users/Site Administrators	2	1070	February 13, 2020
Command to check how long a job has run and a summary of resources the job has used Users/Site Administrators	4	1406	February 9, 2021

How to quantify HPC usage (spare capacity?)

Related topics