Hi, I need help about how to execute jobs in an hyperthreading node.
My configuration is: a single node and hyperthreading active which provides 36 cores / 72 threads.
By default pbsnodes shows that the node has 36 available cores but I want to submit the maximum of sequential jobs using hyperthreading: 72
In order to allow execute in all of the threads I configure the node with:
(base) [root@node01 ~]# pbsnodes -a
node01
Mom = node01
Port = 15002
pbs_version = 20.0.0
ntype = PBS
state = free
pcpus = 72
resources_available.arch = linux
resources_available.host = node01
resources_available.hpmem = 0b
resources_available.mem = 385555mb
resources_available.ncpus = 72
resources_available.ngpus = 2
resources_available.vmem = 389587mb
resources_available.vnode = node01
resources_assigned.accelerator_memory = 0kb
resources_assigned.hbmem = 0kb
resources_assigned.mem = 0kb
resources_assigned.naccelerators = 0
resources_assigned.ncpus = 0
resources_assigned.vmem = 0kb
resv_enable = True
sharing = default_shared
last_state_change_time = Tue Apr 20 15:21:50 2021
last_used_time = Wed Apr 21 10:05:23 2021
However when I submit a number of single jobs (echo “sleep 60” | qsub -l select=1:ncpus=1) greater than 36 only the first 36 jobs are in running status, the rest of jobs are changed to hold status.
Checking the mom logs I can read something like:
pbs_python;Hook;pbs_python;Processing error in pbs_cgroups handling execjob_begin event for job 947.node01: CgroupProcessingError (‘Failed to assign resources’,)
pbs_version = 20.0.0
I don’t know where is the problem.
Thanks in advance.