Resources_used.cpupercent meaning

hollow_op · May 11, 2021, 11:59am

Hi,

I noticed that one of my jobs is using resources_used.cpupercent=8077 . Does this mean I’m using 8077 percent more cpu than I requested for my job ? I only requested 20 cpus for this particular job and it seems like I’m using much more. How is this possible ? I thought that pbspro would kill immediately a job if the cpu usage is above the requested resource.

adarsh · May 11, 2021, 3:01pm

Please refer: AG-308 - PBS Professional 2020.1 Administrator’s Guide
MoM calculates an integer value called cpupercent each polling cycle. This is a moving weighted average of CPU usage for the cycle, given as the average percentage usage of one CPU. For example, a value of 50 means that during a certain period, the job used 50 percent of one CPU. A value of 300 means that during the period, the job used an average of three CPUs.

new_percent = change_in_cpu_time*100 / change_in_walltime
weight = delta_weight[up|down] * walltime/max_poll_period
new_cpupercent = (new_percent * weight) + (old_cpupercent * (1-weight))

delta_weight_up is used if new_percent is higher than the old cpupercent value. delta_weight_down is used if new_percent is lower than the old cpupercent value. delta_weight_[up|down] controls the speed with which cpupercent changes. If delta_weight_[up|down] is 0.0, the value for cpupercent does not change over time. If it is 1.0, cpupercent will take the value of new_percent for the poll period. In this case cpupercent changes quickly.

However, cpupercent is controlled so that it stays at the greater of the average over the entire run or ncpus*100.

max_poll_period is the maximum time between samples, set in MoM’s config file by $max_check_poll, with a default of 120 seconds.

The job is killed if the following is true:
new_cpupercent > ((ncpus * 100 * delta_cpufactor) + delta_percent_over)

No PBS would not kill that job.

alexis.cousein · May 18, 2021, 9:35am

PBS only kills jobs when cpupercent is too high when the MoM config files contain the corresponding $enforce directives.

8077 corresponds to 80.77 CPUs. Note that this is the maximum measured between two successive MoM poll intervals. That is a notoriously noisy metric, and if you don’t have the cgroups hook enabled you can run into some accounting issues, particularly at the end of the job (if a session is a child of another session but is attached to the job using pbs_attach, its usage can be counted for that session and then also reaped in the session of the parent).

resources_used.cput / resources_used.walltime is a much more robust metric, which is why there is another enforcement flag for both options (but again, you can have some “double counting” issues).

If you’re using the cgroups hook and its cpuset or cpu subsystems, then you can implement measures to prevent jobs from using more than they requested and this becomes largely irrelevant. The cgroup hook will also make cpupercent reflect cput/walltime.

matzmz · January 10, 2024, 4:40pm

I apologize for following up on an old post, but I need of some clarification regarding the computation of ‘resources_used.cpupercent’ and ‘resources_used.cputime’ parameters within the accounting file, especially in scenarios where a job utilizes multiple hosts.

From my understanding, it appears that these parameters reflect resource usage exclusively from the master node, potentially disregarding the contributions of other nodes involved in the job. Could someone confirm whether this interpretation is accurate? Are the resource usages of other nodes composing the job indeed omitted from these calculations?

Topic		Replies	Views
Cpuperc information Users/Site Administrators	0	923	November 17, 2020
Stop job with when using more cpus than it request Users/Site Administrators	2	1087	March 29, 2018
Command to check how long a job has run and a summary of resources the job has used Users/Site Administrators	4	1305	February 9, 2021
CPU and Mem usage info for end user - postscripts? Users/Site Administrators	2	1054	February 13, 2020
CPU bursting in a job Users/Site Administrators	4	2965	February 20, 2018

Resources_used.cpupercent meaning

Related topics