Hello Guys,
I’m new with cgroups and GPU nodes and I have a problem when I try to run tensorflow inside a singularity container with PBS cgroups hook enabled.
If I try to run a tensorflow image with the command:
singularity -v shell --nv tensorflow-21.07-tf2-py3.sif
I get this message:
FATAL: container creation failed: mount /proc/self/fd/3->/var/singularity/mnt/session/rootfs error: while mounting image /proc/self/fd/3: failed to find loop device: could not attach image file to loop device: no loop devices available
Disabling cgroups hook and restarting pbs_mom on the gpu node, I’m able to run the command without any problem. The gpu node has one NVIDIA A100 and enabled MIG (splitted into 7 instances).
Can anyone give me some lights on how to configure this environment correctly? Find bellow my cgroups hook:
Thanks
##################################
{
“cgroup_lock_file” : “/var/spool/pbs/mom_priv/cgroups.lock”,
“cgroup_prefix” : “pbspro”,
“exclude_hosts” : ,
“exclude_vntypes” : [“no_cgroups”],
“periodic_resc_update” : true,
“vnode_per_numa_node” : false,
“online_offlined_nodes” : true,
“nvidia-smi” : “/usr/bin/nvidia-smi”,
“use_hyperthreads” : false,
“ncpus_are_cores” : false,
“run_only_on_hosts” : [“gpu1”, “gpu2”, “gpu3”],
“cgroup” : {
“cpuacct” : {
“enabled” : true,
“exclude_hosts” : ,
“exclude_vntypes” :
},
“cpuset” : {
“enabled” : true,
“exclude_cpus” : [0, 8],
“exclude_hosts” : ,
“exclude_vntypes” : ,
“mem_fences” : true,
“mem_hardwall” : false,
“memory_spread_page” : false
},
“devices” : {
“enabled” : true,
“exclude_hosts” : ,
“exclude_vntypes” : ,
“allow” : [
“b : m”,
“c : m”,
“c 195:* m”,
“c 136:* rwm”,
[“infiniband/rdma_cm”,“rwm”],
[“fuse”,“rwm”],
[“net/tun”,“rwm”],
[“tty”,“rwm”],
[“ptmx”,“rwm”],
[“console”,“rwm”],
[“null”,“rwm”],
[“zero”,“rwm”],
[“full”,“rwm”],
[“random”,“rwm”],
[“urandom”,“rwm”],
[“cpu/0/cpuid”,“rwm”,“*”],
[“nvidia-modeset”, “rwm”],
[“nvidia-uvm”, “rwm”],
[“nvidia-uvm-tools”, “rwm”],
[“nvidiactl”, “rwm”]
]
},
“hugetlb” : {
“enabled” : false,
“exclude_hosts” : ,
“exclude_vntypes” : ,
“default” : “0MB”,
“reserve_percent” : “0”,
“reserve_amount” : “0MB”
},
“memory” : {
“enabled” : true,
“exclude_hosts” : ,
“exclude_vntypes” : ,
“soft_limit” : false,
“default” : “256MB”,
“reserve_percent” : “0”,
“reserve_amount” : “1GB”
},
“memsw” : {
“enabled” : true,
“exclude_hosts” : ,
“exclude_vntypes” : ,
“default” : “256MB”,
“reserve_percent” : “0”,
“reserve_amount” : “1GB”
}
}
}