Can you please let me know if resources_used.mem that you get when executing qstat -f is the aggreagted amount of memory used by the job or is it the maximum amount of memory?
if the job had peaks and valleys but at its peak it reached 100GB of memory then would resources_used.mem be equal 100GB? or maybe the metric would simply grow in size with some other logic that I’m missing?
And also how can I extract the “temporal” memory usage of each job (for all jobs)?
I want to be able to build a graph of the amount of memory used by each job.
resource_used.mem value might not be exact or accurate due to some factors.
Please check the below section in the https://www.altair.com/pdfs/pbsworks/PBSAdminGuide2020.1.pdf
Thank you adarsh,
It seems from the docs that resources_used.mem is “peak-value” and that is indeed not reliable because it depends on when the polling occurred.
Enabling cgroups is something that we would definitely look into but the question remains, what would be the best way to build a graph/chart with the historical memory usage of a job?
Do you know of any built-in integration with elasticsearch? Would you recommend implementing it myself using hooks?
I would appreciate you help here
Thank you Roy
Using mom periodic hooks
There is no integration with elasticsearch
Yes, using mom periodic hook collecting and pushing it to elasticsearch stack.