CPU bursting in a job

CentOS 7
PBSPro 14.1.0

I am trying to configure the PBS MOM daemon to allow for CPU bursting. Following the Admin guide I put the following in /var/spool/pbs/mom_priv/config and restarted MOM on the compute nodes.

$enforce mem
$enforce cpuaverage
$enforce cpuburst
$enforce delta_percent_over 50
$enforce delta_cpufactor 1.05
$enforce delta_weightup 0.4
$enforce delta_weightdown 0.1
$enforce average_percent_over 50
$enforce average_cpufactor 1.025
$enforce average_trialperiod 120

After restart jobs are still killed with the following errors.

cormhap_slancy_defaults.o437013.31:=>> PBS: job killed: ncpus 17.4 exceeded limit 16 (burst)
cormhap_slancy_defaults.o437013.32:=>> PBS: job killed: ncpus 17.7 exceeded limit 16 (burst)
cormhap_slancy_defaults.o437013.35:=>> PBS: job killed: ncpus 17.8 exceeded limit 16 (burst)
cormhap_slancy_defaults.o437013.36:=>> PBS: job killed: ncpus 17.8 exceeded limit 16 (burst)
cormhap_slancy_defaults.o437013.40:=>> PBS: job killed: ncpus 17.6 exceeded limit 16 (burst)
cormhap_slancy_defaults.o437016.36:=>> PBS: job killed: ncpus 17.8 exceeded limit 16 (burst)

Have I missed a step in getting this configured correctly?


I believe, but am not 100% sure, that with these parameters set (which all appear to be the default aside from turning actual enforcement on) a 16 cpu job will be killed if it uses more than 17.3, which these all are. From the manual:

The job is killed if the following is true:
new_cpupercent > ((ncpus * 100 * delta_cpufactor) + delta_percent_over)

((16 * 100 * 1.05) + 50 ) = 1730

So the delta_percent_over is not a percentage of the number of cpus where in this instance it would be 800? Is the percentage just a mislableling?

I would assume that the delta_percentage_over would be used in the following method to follow the formula above if it were to make sense in a percentage model.:

((16 * 100 * 1.05) + 800))


I am not sure it is actually mislabeled, but there are multiple ways in which it COULD make sense. The implementation is expressed in absolute percentage of cpus, that is, for a value of 50 you are saying: “allow usage of an extra half of a cpu” where the amount of extra cpu that can be used is expressed in terms of percentage of a cpu. It does not allow for one to say"allow usage of an extra 50% over whatever the job’s (ncpus * 100 * delta_cpufactor) happens to be".

Think I have figured out how to get the 50% bursting overhead on jobs. I just needed to change $enforce delta_cpufactor 1.05 to $enforce delta_cpufactor 1.50 and it seems to have done trick.

As a side note there does seem to be an error in the docs as it states in the table labelled 5-15 that the default is 1.50, but it is actually 1.05.