Dear all PBS users,
I looked on the forum but I have not really found a topic treating the issue I am facing.
We are running pbs version 20.0.0 on machines having CentOS 7.9.2009 (Core) OS installed on.
Within qmgr I checked that the pbs_cgroup hook is turned on:
Qmgr: list hook
Hook pbs_cgroups
type = site
enabled = true
event = execjob_begin,execjob_epilogue,execjob_end,execjob_launch,
execjob_attach,
execjob_resize,
execjob_abort,
exechost_periodic,
exechost_startup
user = pbsadmin
alarm = 120
freq = 10
order = 100
debug = false
fail_action = offline_vnodes
and I also checked the associated configuration file (the default one):
Qmgr: export hook pbs_cgroups application/x-config default
{
“cgroup_prefix” : “pbs_jobs”,
“exclude_hosts” : ,
“exclude_vntypes” : [“no_cgroups”],
“run_only_on_hosts” : ,
“periodic_resc_update” : true,
“vnode_per_numa_node” : false,
“online_offlined_nodes” : true,
“use_hyperthreads” : false,
“ncpus_are_cores” : false,
“discover_gpus” : true,
“manage_rlimit_as” : true,
“cgroup” : {
“cpuacct” : {
“enabled” : true,
“exclude_hosts” : ,
“exclude_vntypes” :
},
“cpuset” : {
“enabled” : true,
“exclude_cpus” : ,
“exclude_hosts” : ,
“exclude_vntypes” : ,
“mem_fences” : false,
“mem_hardwall” : false,
“memory_spread_page” : false
},
“devices” : {
“enabled” : false,
“exclude_hosts” : ,
“exclude_vntypes” : ,
“allow” : [
“b : rwm”,
“c : rwm”
]
},
“memory” : {
“enabled” : true,
“exclude_hosts” : ,
“exclude_vntypes” : ,
“soft_limit” : false,
“enforce_default” : true,
“exclhost_ignore_default” : false,
“default” : “256MB”,
“reserve_percent” : 0,
“reserve_amount” : “1GB”
},
“memsw” : {
“enabled” : false,
“exclude_hosts” : ,
“exclude_vntypes” : ,
“enforce_default” : true,
“exclhost_ignore_default” : false,
“default” : “0B”,
“reserve_percent” : 0,
“reserve_amount” : “64MB”,
“manage_cgswap” : false
},
“hugetlb” : {
“enabled” : false,
“exclude_hosts” : ,
“exclude_vntypes” : ,
“enforce_default” : true,
“exclhost_ignore_default” : false,
“default” : “0B”,
“reserve_percent” : 0,
“reserve_amount” : “0B”
}
}
}
It shows that the default memory limit is set to “256Mb” with a soft_limit being set to False.
Now submitting a job presenting the following the header:
#!/bin/bash -l
#PBS -l walltime=96:00:00
#PBS -l nodes=node1:ppn=1
#PBS -l mem=25gb
#PBS -q Q1
#PBS -o job.out
#PBS -e job.err
I see using qstat -f that the job is using resources_used.mem = 262144kb and a resources_used.vmem = 5218548kb while the requested ressources have been correctly understood by the system: Resource_List.select = 1:ncpus=1:mem=26214400KB:host=node1.
If I looked into the system files to get what is the actual memory limit set to my job, I get that:
cat /sys/fs/cgroup/memory/pbs_jobs.service/jobid/1964.NODE1/memory.limit_in_bytes
268435456
Therefore, I understand that PBS associated to my job a memory limit equal to 268.mb while I was requesting 25gb! My machine has 376Gb (resources_available.mem = 376355mb) of RAM available with 368Gb available at the moment of running this job.
What should I do to get my job running on the RAM?
Thank you in advance for your help!