Failed to assign resources when hypertheading is enabled

jevole · April 21, 2021, 6:35pm

Hi, I need help about how to execute jobs in an hyperthreading node.
My configuration is: a single node and hyperthreading active which provides 36 cores / 72 threads.

By default pbsnodes shows that the node has 36 available cores but I want to submit the maximum of sequential jobs using hyperthreading: 72

In order to allow execute in all of the threads I configure the node with:
(base) [root@node01 ~]# pbsnodes -a
node01
Mom = node01
Port = 15002
pbs_version = 20.0.0
ntype = PBS
state = free
pcpus = 72
resources_available.arch = linux
resources_available.host = node01
resources_available.hpmem = 0b
resources_available.mem = 385555mb
resources_available.ncpus = 72
resources_available.ngpus = 2
resources_available.vmem = 389587mb
resources_available.vnode = node01
resources_assigned.accelerator_memory = 0kb
resources_assigned.hbmem = 0kb
resources_assigned.mem = 0kb
resources_assigned.naccelerators = 0
resources_assigned.ncpus = 0
resources_assigned.vmem = 0kb
resv_enable = True
sharing = default_shared
last_state_change_time = Tue Apr 20 15:21:50 2021
last_used_time = Wed Apr 21 10:05:23 2021

However when I submit a number of single jobs (echo “sleep 60” | qsub -l select=1:ncpus=1) greater than 36 only the first 36 jobs are in running status, the rest of jobs are changed to hold status.
Checking the mom logs I can read something like:

pbs_python;Hook;pbs_python;Processing error in pbs_cgroups handling execjob_begin event for job 947.node01: CgroupProcessingError (‘Failed to assign resources’,)
pbs_version = 20.0.0
I don’t know where is the problem.
Thanks in advance.

adarsh · April 21, 2021, 9:13pm

Could you please disable the cgroups hook and try.

qmgr -c “set hook pbs_cgroups enabled=false”

Note: i am not sure whether you are using cgroups for requirements.
Please type the above line, sometimes copy paste has some special characters.

jevole · April 22, 2021, 7:42am

Yes that worked.
It means that the cgroup constraints are disabled for all PBS jobs submitted to the system? cpu, mem, etc ?

I tried enabling pbs_cgroups and use_hyperthreads = true but not working.

Below my pbs_cgroups.json configuration:
{
“cgroup_prefix” : “pbs_jobs”,
“exclude_hosts” : [],
“exclude_vntypes” : [“no_cgroups”],
“run_only_on_hosts” : [],
“periodic_resc_update” : true,
“vnode_per_numa_node” : false,
“online_offlined_nodes” : true,
“use_hyperthreads” : false,
“ncpus_are_cores” : false,
“cgroup” : {
“cpuacct” : {
“enabled” : true,
“exclude_hosts” : [],
“exclude_vntypes” : []
},
“cpuset” : {
“enabled” : true,
“exclude_cpus” : [],
“exclude_hosts” : [],
“exclude_vntypes” : [],
“mem_fences” : true,
“mem_hardwall” : false,
“memory_spread_page” : false
},
“devices” : {
“enabled” : true,
“exclude_hosts” : [],
“exclude_vntypes” : [],
“allow” : [
“c 195:* m”,
“c 136:* rwm”,
[“infiniband/rdma_cm”,“rwm”],
[“fuse”,“rwm”],
[“net/tun”,“rwm”],
[“tty”,“rwm”],
[“ptmx”,“rwm”],
[“console”,“rwm”],
[“null”,“rwm”],
[“zero”,“rwm”],
[“full”,“rwm”],
[“random”,“rwm”],
[“urandom”,“rwm”],
[“cpu/0/cpuid”,“rwm”,"*"],
[“nvidia-modeset”, “rwm”],
[“nvidia-uvm”, “rwm”],
[“nvidia-uvm-tools”, “rwm”],
[“nvidiactl”, “rwm”],
“b : rwm”,
“c : rwm”
]
},
“hugetlb” : {
“enabled” : false,
“exclude_hosts” : [],
“exclude_vntypes” : [],
“default” : “0MB”,
“reserve_percent” : 0,
“reserve_amount” : “0MB”
},
“memory” : {
“enabled” : true,
“exclude_hosts” : [],
“exclude_vntypes” : [],
“soft_limit” : false,
“default” : “256MB”,
“reserve_percent” : 0,
“reserve_amount” : “64MB”
},
“memsw” : {
“enabled” : false,
“exclude_hosts” : [],
“exclude_vntypes” : [],
“default” : “256MB”,
“reserve_percent” : 0,
“reserve_amount” : “64MB”
}
}
}

adarsh · April 22, 2021, 9:29pm

Please check whether you are using the latest cgroups hook/configuration file:

github.com

openpbs/openpbs/blob/217283f1a2ea5d987f2687dbb2035ee554db2762/src/hooks/cgroups/pbs_cgroups.CF

{
    "cgroup_prefix"         : "pbs_jobs",
    "exclude_hosts"         : [],
    "exclude_vntypes"       : ["no_cgroups"],
    "run_only_on_hosts"     : [],
    "periodic_resc_update"  : true,
    "vnode_per_numa_node"   : false,
    "online_offlined_nodes" : true,
    "use_hyperthreads"      : false,
    "ncpus_are_cores"       : false,
    "discover_gpus"         : true,
    "manage_rlimit_as"      : true,
    "cgroup" : {
        "cpuacct" : {
            "enabled"            : true,
            "exclude_hosts"      : [],
            "exclude_vntypes"    : []
        },
        "cpuset" : {
            "enabled"            : true,

This file has been truncated. show original

github.com

openpbs/openpbs/blob/217283f1a2ea5d987f2687dbb2035ee554db2762/src/hooks/cgroups/pbs_cgroups.PY

# coding: utf-8

# Copyright (C) 1994-2021 Altair Engineering, Inc.
# For more information, contact Altair at www.altair.com.
#
# This file is part of both the OpenPBS software ("OpenPBS")
# and the PBS Professional ("PBS Pro") software.
#
# Open Source License Information:
#
# OpenPBS is free software. You can redistribute it and/or modify it under
# the terms of the GNU Affero General Public License as published by the
# Free Software Foundation, either version 3 of the License, or (at your
# option) any later version.
#
# OpenPBS is distributed in the hope that it will be useful, but WITHOUT
# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Affero General Public
# License for more details.
#

This file has been truncated. show original

Please check the cgroup documentation at : https://www.altair.com/pdfs/pbsworks/PBSAdminGuide2021.1.pdf

Topic		Replies	Views
Failed to assign resources to job Users/Site Administrators	9	1693	May 26, 2022
PP-877: UCR discussion for hyper-threading support in PBS Developers	5	1624	August 2, 2017
Problems with PBS and multi-threading processes Users/Site Administrators	1	918	March 24, 2021
Schedulers doesn't seem to be holding jobs Users/Site Administrators	11	1626	June 18, 2019
Cannot run job on the execution host Users/Site Administrators	1	3825	April 24, 2017

Failed to assign resources when hypertheading is enabled

Related topics