How to set default memory allocation for jobs in a queue, without killing the job

Hi Folks,
Here is my use case:

We are running Openpbs version 20.0.1 on machines having CentOS 7.9.2009 (Core) OS installed on. In our hpc environment, we have multiple queues, say:
queue_1
queue_2
queue_3
queue_4

For each queue, jobs get submitted in either of below two ways:
1> user specifies the memory requested in job, as:
qsub -I -l select=1:ncpus=4:mem=XG -q queue_1
**( “X” has a wide range : starting from 1 GB, 5 GB, 10 GB, 50 GB, 100 GB, 250 GB, 500 GB etc.)

2> user submits job without asking “requested memory”

What we want ?
We want to allocate default 16 GB memory for jobs submitted “without requested memory” - How to can we do that ?

I read forum and PBS administration guide and think it can be done by >
a> updating default memory in cgroup from 256 MB to 16 GB. Current settings as below:

Qmgr: export hook pbs_cgroups application/x-config default
{
“cgroup_prefix” : “pbs_jobs”,
“exclude_hosts” : ,
“exclude_vntypes” : [“no_cgroups”],
“run_only_on_hosts” : ,
“periodic_resc_update” : true,
“vnode_per_numa_node” : false,
“online_offlined_nodes” : true,
“use_hyperthreads” : false,
“ncpus_are_cores” : false,
“cgroup” : {
“cpuacct” : {
“enabled” : true,
“exclude_hosts” : ,
“exclude_vntypes” :
},
“cpuset” : {
“enabled” : true,
“exclude_cpus” : ,
“exclude_hosts” : ,
“exclude_vntypes” : ,
“mem_fences” : true,
“mem_hardwall” : false,
“memory_spread_page” : false
},
“devices” : {
“enabled” : false,
“exclude_hosts” : ,
“exclude_vntypes” : ,
“allow” : [
“b : rwm”,
“c : rwm”
]
},
“hugetlb” : {
“enabled” : false,
“exclude_hosts” : ,
“exclude_vntypes” : ,
“default” : “0MB”,
“reserve_percent” : 0,
“reserve_amount” : “0MB”
},
“memory” : {
“enabled” : true,
“exclude_hosts” : ,
“exclude_vntypes” : ,
“soft_limit” : false,
“default” : “256MB”,
“reserve_percent” : 0,
“reserve_amount” : “64MB”
},
“memsw” : {
“enabled” : true,
“exclude_hosts” : ,
“exclude_vntypes” : ,
“default” : “256MB”,
“reserve_percent” : 0,
“reserve_amount” : “64MB”
}
}
}

However I do not have exact steps/commands on how to do this.

b> I need to set “soft_limit” to “True”, so kernel does not kill job if it exceeds default 16 GB( per PB professional guide : https://2021.help.altair.com/2021.1.2/PBS%20Professional/PBSAdminGuide2021.1.2.pdf, section : 16.5.3.9.v"

Has anyone done this exact config change ? Please share steps you followed.

I want to make sure that once I enable this, it does NOT affect other running jobs, which are submitted by users with “requested_memory” (but with variable range).

Appreciate comments from forum please.

Thanks
-Subhajit

The scheduler makes its decision on choosing computational resources based on the job request (ncpus, mem, custom resources, etc) , so the request cannot be dynamic in nature.

  • via queuejob hook
  • via qmgr -c “set queue QUEUENAME resources_default.mem=3g”
  • via qsub wrapper script which has all the dynamics of checking, but when submitting it knows what actual mem value needs to be.

The above resource (ncpus, mem) request(s) are for scheduler to place the job on the computational resource(s) that can suffice this resource request, the scheduler does not imply the rule on the job to not to use more than the requested resources (via qsub), if you want to imply that rule, then cgroups come into the picture (say if the job exceeds the requested resources, the job will be deleted).

Hope this helps

Thank you @adarsh ! Let me test this in my lower environment before applying to production cluster. I will go with :

qmgr -c “set queue QUEUENAME resources_default.mem=16g”

1 Like