Cgroups hook write permission error

I’m running openpbs 20.0.1 and am seeing a memsw write permission denied error on all compute nodes (and the nodes are swapping pretty heavily when running array jobs such as molpro 2020):

06/23/2022 09:25:46;0100;pbs_python;Hook;pbs_python;main: Event type is execjob_begin, job ID is 146069.flux
06/23/2022 09:25:46;0100;pbs_python;Hook;pbs_python;create_job: Creating directory /sys/fs/cgroup/blkio,cpuacct,memory,freezer/pbs_jobs.service/jobid/146069.flux/
06/23/2022 09:25:46;0100;pbs_python;Hook;pbs_python;create_job: Creating directory /sys/fs/cgroup/cpuset/pbs_jobs.service/jobid/146069.flux/

** ISSUE HERE **
06/23/2022 09:25:47;0002;pbs_python;Hook;pbs_python;write_value: Permission denied: /sys/fs/cgroup/blkio,cpuacct,memory,freezer/pbs_jobs.service/jobid/146069.flux/memory.memsw.limit_in_bytes
06/23/2022 09:25:47;0008;pbs_python;Job;146069.flux;update_job_usage: CPU percent: 0
06/23/2022 09:25:47;0008;pbs_python;Job;146069.flux;update_job_usage: CPU usage: 0.000 secs
06/23/2022 09:25:47;0008;pbs_python;Job;146069.flux;update_job_usage: Memory usage: mem=0b
06/23/2022 09:25:47;0008;pbs_python;Job;146069.flux;update_job_usage: No max vmem data
06/23/2022 09:25:47;0008;pbs_python;Job;146069.flux;update_job_usage: No vmem fail count data
06/23/2022 09:25:47;0100;pbs_python;Hook;pbs_python;Hook ended: pbs_cgroups, job ID 146069.flux, event_type 64 (elapsed time: 0.3918)
06/23/2022 09:25:47;0008;pbs_mom;Job;146069.flux;no active tasks
06/23/2022 09:25:47;0080;pbs_mom;Job;146069.flux;running prologue
06/23/2022 09:25:47;0100;pbs_python;Hook;pbs_python;main: Event type is execjob_launch, job ID is 146069.flux
06/23/2022 09:25:47;0100;pbs_python;Hook;pbs_python;Hook ended: pbs_cgroups, job ID 146069.flux, event_type 2048 (elapsed time: 0.2916)
06/23/2022 09:25:47;0008;pbs_mom;Job;146069.flux;Started, pid = 29607
06/23/2022 09:26:48;0008;pbs_python;Job;146069.flux;update_job_usage: CPU percent: 0
06/23/2022 09:26:48;0008;pbs_python;Job;146069.flux;update_job_usage: CPU usage: 639.773 secs
06/23/2022 09:26:48;0008;pbs_python;Job;146069.flux;update_job_usage: Memory usage: mem=1600744kb
06/23/2022 09:26:48;0008;pbs_python;Job;146069.flux;update_job_usage: No max vmem data
06/23/2022 09:26:48;0008;pbs_python;Job;146069.flux;update_job_usage: No vmem fail count data
06/23/2022 09:28:50;0008;pbs_python;Job;146069.flux;update_job_usage: CPU percent: 426
06/23/2022 09:28:50;0008;pbs_python;Job;146069.flux;update_job_usage: CPU usage: 2051.048 secs

# file: sys/fs/cgroup/blkio,cpuacct,memory,freezer/pbs_jobs.service/jobid/146069.flux/
# owner: root
# group: root
user::rwx
group::r-x
other::r-x
````Preformatted text`

the pbs_cgroups config is as follows:
{
    "cgroup_prefix"         : "pbs_jobs",
    "exclude_hosts"         : [],
    "exclude_vntypes"       : ["no_cgroups"],
    "run_only_on_hosts"     : [],
    "periodic_resc_update"  : true,
    "vnode_per_numa_node"   : false,
    "online_offlined_nodes" : true,
    "use_hyperthreads"      : false,
    "ncpus_are_cores"       : false,
    "cgroup" : {
        "cpuacct" : {
            "enabled"            : true,
            "exclude_hosts"      : [],
            "exclude_vntypes"    : []
        },
        "cpuset" : {
            "enabled"            : true,
            "exclude_cpus"       : [],
            "exclude_hosts"      : [],
            "exclude_vntypes"    : [],
            "mem_fences"         : true,
            "mem_hardwall"       : false,
            "memory_spread_page" : false
        },
        "devices" : {
            "enabled"            : false,
            "exclude_hosts"      : [],
            "exclude_vntypes"    : [],
            "allow"              : [
                "b *:* rwm",
                "c *:* rwm"
            ]
        },
        "hugetlb" : {
            "enabled"            : false,
            "exclude_hosts"      : [],
            "exclude_vntypes"    : [],
            "default"            : "0MB",
            "reserve_percent"    : 0,
            "reserve_amount"     : "0MB"
        },
        "memory" : {
            "enabled"            : true,
            "exclude_hosts"      : [],
            "exclude_vntypes"    : [],
            "soft_limit"         : false,
            "default"            : "256MB",
            "reserve_percent"    : 0,
            "reserve_amount"     : "64MB"
        },
        "memsw" : {
            "enabled"            : true,
            "exclude_hosts"      : [x],
            "exclude_vntypes"    : [],
            "default"            : "256MB",
            "reserve_percent"    : 0,
            "reserve_amount"     : "64MB"
        }
    }
}

Any help and or recommendations would be greatly appreciated.

Thanks

I have never seen a directory named blkio,cpuacct,memory,freezer under /sys/fs/cgroup

I suspect that directory does not exist. Could you cd to /sys/fs/cgroup and provide the output of ls -l

Which OS are you running?

Hello, and thanks for your help. Sorry, I have been out of town. Here is the output of what you’ve requested:

Alpine:~ # cd /sys/fs/cgroup

Alpine:/sys/fs/cgroup # ls -l
total 0
dr-xr-xr-x 5 root root 0 Jun 3 15:41 blkio
lrwxrwxrwx 1 root root 11 Jun 1 08:11 cpu → cpu,cpuacct
dr-xr-xr-x 5 root root 0 Jun 3 15:41 cpu,cpuacct
lrwxrwxrwx 1 root root 11 Jun 1 08:11 cpuacct → cpu,cpuacct
dr-xr-xr-x 2 root root 0 Jun 3 15:41 cpuset
dr-xr-xr-x 5 root root 0 Jun 3 15:41 devices
dr-xr-xr-x 2 root root 0 Jun 3 15:41 freezer
dr-xr-xr-x 2 root root 0 Jun 3 15:41 hugetlb
dr-xr-xr-x 5 root root 0 Jun 3 15:41 memory
lrwxrwxrwx 1 root root 16 Jun 1 08:11 net_cls → net_cls,net_prio
dr-xr-xr-x 5 root root 0 Jun 3 15:41 net_cls,net_prio
lrwxrwxrwx 1 root root 16 Jun 1 08:11 net_prio → net_cls,net_prio
dr-xr-xr-x 2 root root 0 Jun 3 15:41 perf_event
dr-xr-xr-x 5 root root 0 Jun 3 15:41 pids
dr-xr-xr-x 2 root root 0 Jun 3 15:41 rdma
dr-xr-xr-x 5 root root 0 Jun 3 15:41 systemd

Alpine:/sys/fs/cgroup # cat /etc/os-release
NAME=“SLE-HPC”
VERSION=“12-SP5”
VERSION_ID=“12.5”
PRETTY_NAME=“SUSE Linux Enterprise High Performance Computing 12 SP5”
ID=“sle-hpc”
ANSI_COLOR=“0;32”
CPE_NAME=“cpe:/o:suse:sle-hpc:12:sp5”

Alpine:/sys/fs/cgroup # uname -a
Linux Alpine 4.12.14-122.121-default #1 SMP Wed May 4 10:35:25 UTC 2022 (a686fdb) x86_64 x86_64 x86_64 GNU/Linux

The problem is definitely with the “blkio,cpuacct,memory,freezer” directory name. That string does not exist anywhere in the hook itself. It does exist in what looks like a comment in the preformatted text you provided. Where did that text come from?

When the hook starts, it figures out where all the cgroups are located by examining cgroup entries in the /proc/mounts file. It’s unlikely, but possible, that /proc/mounts contains an entry for “blkio,cpuacct,memory,freezer” on the compute nodes. Could you check?

Hello, I mistakenly posted info from the head node and not the compute node in my last post. Every job that runs sees cgroups creating these directories as seen below (and that directory seems to be a compilation from the four named directories):

flux01:/sys/fs/cgroup # ls -l
total 0
lrwxrwxrwx 1 root root 28 Jun 1 05:55 blkio → blkio,cpuacct,memory,freezer
dr-xr-xr-x 6 root root 0 Jul 1 12:54 blkio,cpuacct,memory,freezer
dr-xr-xr-x 5 root root 0 Jun 1 05:55 cpu
lrwxrwxrwx 1 root root 28 Jun 1 05:55 cpuacct → blkio,cpuacct,memory,freezer
dr-xr-xr-x 3 root root 0 Jun 1 05:55 cpuset
dr-xr-xr-x 5 root root 0 Jun 1 05:55 devices
lrwxrwxrwx 1 root root 28 Jun 1 05:55 freezer → blkio,cpuacct,memory,freezer
dr-xr-xr-x 2 root root 0 Jun 1 05:55 hugetlb
lrwxrwxrwx 1 root root 28 Jun 1 05:55 memory → blkio,cpuacct,memory,freezer
dr-xr-xr-x 5 root root 0 Jun 1 05:55 net_cls
dr-xr-xr-x 2 root root 0 Jun 1 05:55 net_prio
dr-xr-xr-x 2 root root 0 Jun 1 05:55 perf_event
dr-xr-xr-x 5 root root 0 Jun 1 05:55 pids
dr-xr-xr-x 2 root root 0 Jun 1 05:55 rdma
dr-xr-xr-x 5 root root 0 Jun 1 05:55 systemd
flux01:/sys/fs/cgroup #

flux01:~ # ls -lthr /sys/fs/cgroup/blkio,cpuacct,memory,freezer/pbs_jobs.service/jobid/163423.Alpine/
total 0
-rw-r–r-- 1 root root 0 Jun 29 10:18 memory.limit_in_bytes
-rw-r–r-- 1 root root 0 Jun 29 10:18 cpuacct.usage
-rw-r–r-- 1 root root 0 Jun 29 10:18 memory.max_usage_in_bytes
-rw-r–r-- 1 root root 0 Jun 29 10:18 memory.failcnt
-rw-r–r-- 1 root root 0 Jun 29 10:18 memory.soft_limit_in_bytes
-rw-r–r-- 1 root root 0 Jun 29 10:18 tasks
-rw-r–r-- 1 root root 0 Jul 1 12:54 notify_on_release
-rw-r–r-- 1 root root 0 Jul 1 12:54 memory.use_hierarchy
-r–r–r-- 1 root root 0 Jul 1 12:54 memory.usage_in_bytes
-rw-r–r-- 1 root root 0 Jul 1 12:54 memory.swappiness
-r–r–r-- 1 root root 0 Jul 1 12:54 memory.stat
---------- 1 root root 0 Jul 1 12:54 memory.pressure_level
-rw-r–r-- 1 root root 0 Jul 1 12:54 memory.oom_control
-r–r–r-- 1 root root 0 Jul 1 12:54 memory.numa_stat
-rw-r–r-- 1 root root 0 Jul 1 12:54 memory.move_charge_at_immigrate
-r–r–r-- 1 root root 0 Jul 1 12:54 memory.kmem.usage_in_bytes
-r–r–r-- 1 root root 0 Jul 1 12:54 memory.kmem.tcp.usage_in_bytes
-rw-r–r-- 1 root root 0 Jul 1 12:54 memory.kmem.tcp.max_usage_in_bytes
-rw-r–r-- 1 root root 0 Jul 1 12:54 memory.kmem.tcp.limit_in_bytes
-rw-r–r-- 1 root root 0 Jul 1 12:54 memory.kmem.tcp.failcnt
-r–r–r-- 1 root root 0 Jul 1 12:54 memory.kmem.slabinfo
-rw-r–r-- 1 root root 0 Jul 1 12:54 memory.kmem.max_usage_in_bytes
-rw-r–r-- 1 root root 0 Jul 1 12:54 memory.kmem.limit_in_bytes
-rw-r–r-- 1 root root 0 Jul 1 12:54 memory.kmem.failcnt
–w------- 1 root root 0 Jul 1 12:54 memory.force_empty
-rw-r–r-- 1 root root 0 Jul 1 12:54 freezer.state
-r–r–r-- 1 root root 0 Jul 1 12:54 freezer.self_freezing
-r–r–r-- 1 root root 0 Jul 1 12:54 freezer.parent_freezing
-r–r–r-- 1 root root 0 Jul 1 12:54 cpuacct.usage_user
-r–r–r-- 1 root root 0 Jul 1 12:54 cpuacct.usage_sys
-r–r–r-- 1 root root 0 Jul 1 12:54 cpuacct.usage_percpu_user
-r–r–r-- 1 root root 0 Jul 1 12:54 cpuacct.usage_percpu_sys
-r–r–r-- 1 root root 0 Jul 1 12:54 cpuacct.usage_percpu
-r–r–r-- 1 root root 0 Jul 1 12:54 cpuacct.usage_all
-r–r–r-- 1 root root 0 Jul 1 12:54 cpuacct.stat
-rw-r–r-- 1 root root 0 Jul 1 12:54 cgroup.procs
–w–w–w- 1 root root 0 Jul 1 12:54 cgroup.event_control
-rw-r–r-- 1 root root 0 Jul 1 12:54 cgroup.clone_children
-rw-r–r-- 1 root root 0 Jul 1 12:54 blkio.weight_device
-rw-r–r-- 1 root root 0 Jul 1 12:54 blkio.weight
-r–r–r-- 1 root root 0 Jul 1 12:54 blkio.time_recursive
-r–r–r-- 1 root root 0 Jul 1 12:54 blkio.time
-rw-r–r-- 1 root root 0 Jul 1 12:54 blkio.throttle.write_iops_device
-rw-r–r-- 1 root root 0 Jul 1 12:54 blkio.throttle.write_bps_device
-rw-r–r-- 1 root root 0 Jul 1 12:54 blkio.throttle.read_iops_device
-rw-r–r-- 1 root root 0 Jul 1 12:54 blkio.throttle.read_bps_device
-r–r–r-- 1 root root 0 Jul 1 12:54 blkio.throttle.io_serviced_recursive
-r–r–r-- 1 root root 0 Jul 1 12:54 blkio.throttle.io_serviced
-r–r–r-- 1 root root 0 Jul 1 12:54 blkio.throttle.io_service_bytes_recursive
-r–r–r-- 1 root root 0 Jul 1 12:54 blkio.throttle.io_service_bytes
-r–r–r-- 1 root root 0 Jul 1 12:54 blkio.sectors_recursive
-r–r–r-- 1 root root 0 Jul 1 12:54 blkio.sectors
–w------- 1 root root 0 Jul 1 12:54 blkio.reset_stats
-rw-r–r-- 1 root root 0 Jul 1 12:54 blkio.leaf_weight_device
-rw-r–r-- 1 root root 0 Jul 1 12:54 blkio.leaf_weight
-r–r–r-- 1 root root 0 Jul 1 12:54 blkio.io_wait_time_recursive
-r–r–r-- 1 root root 0 Jul 1 12:54 blkio.io_wait_time
-r–r–r-- 1 root root 0 Jul 1 12:54 blkio.io_serviced_recursive
-r–r–r-- 1 root root 0 Jul 1 12:54 blkio.io_serviced
-r–r–r-- 1 root root 0 Jul 1 12:54 blkio.io_service_time_recursive
-r–r–r-- 1 root root 0 Jul 1 12:54 blkio.io_service_time
-r–r–r-- 1 root root 0 Jul 1 12:54 blkio.io_service_bytes_recursive
-r–r–r-- 1 root root 0 Jul 1 12:54 blkio.io_service_bytes
-r–r–r-- 1 root root 0 Jul 1 12:54 blkio.io_queued_recursive
-r–r–r-- 1 root root 0 Jul 1 12:54 blkio.io_queued
-r–r–r-- 1 root root 0 Jul 1 12:54 blkio.io_merged_recursive
-r–r–r-- 1 root root 0 Jul 1 12:54 blkio.io_merged
-rw-r–r-- 1 root root 0 Jul 1 12:54 blkio.bfq.weight
-r–r–r-- 1 root root 0 Jul 1 12:54 blkio.bfq.io_serviced_recursive
-r–r–r-- 1 root root 0 Jul 1 12:54 blkio.bfq.io_serviced
-r–r–r-- 1 root root 0 Jul 1 12:54 blkio.bfq.io_service_bytes_recursive
-r–r–r-- 1 root root 0 Jul 1 12:54 blkio.bfq.io_service_bytes

Thanks

What flavor of Linux are the compute nodes running? I haven’t seen the cgroup directories arranged that way. At least not on any of the systems the hook was tested on. Your head node has a more conventional configuration.

The head node does not have cgroups activated. The compute nodes are the same sle hpc 12 sp5 flavor as the head node. What I don’t understand is why all of the other cgroups dirs have the correct permissions and are created except ‘memory.memsw.limit_in_bytes’ in that combined directory and the error is seen on all of the computes. All molpro array-type jobs will run and the output data is as expected, but when you qstat -a the ‘Elap Time’ always stays at 00:00 through job completion and never increments. This also seems to be causing memory swapping issues when multiple jobs of this type are mixed with other jobs types on any given node. On these same nodes, every other job type does have the ‘Elap Time’ working as expected.

Thanks