Not Running: Insufficient amount of resource: ncpus

matzmz · November 20, 2022, 5:31pm

I have a strange issue when I try to schedule the following job.

I’m requesting the scratch resource that is calculated by a periodic hook.

#!/bin/bash
#PBS -l select=1:ncpus=20:scratch=10g
#PBS -q workq
#PBS -l walltime=00:10:00
#PBS -j oe
#PBS -N scratch

sleep 10m

The job gets queued, but it will never start. The job comment is updated with the following value:

 comment = Not Running: Insufficient amount of resource: ncpu

These are the settings of the nodes (all the nodes have the same settings)

gnode03
     Mom = gnode03
     ntype = PBS
     state = free
     pcpus = 32
     resources_available.arch = linux
     resources_available.host = gnode03
     resources_available.hpmem = 0b
     resources_available.mem = 0b
     resources_available.ncpus = 0
     resources_available.ngpus = 0
     resources_available.scratch = 3559041024kb
     resources_available.vmem = 0b
     resources_available.vnode = gnode03
     resources_assigned.accelerator_memory = 0kb
     resources_assigned.hbmem = 0kb
     resources_assigned.mem = 0kb
     resources_assigned.naccelerators = 0
     resources_assigned.ncpus = 0
     resources_assigned.vmem = 0kb
     resv_enable = True
     sharing = default_shared
     license = l
     last_state_change_time = Sun Nov 20 18:11:54 2022
     last_used_time = Wed Oct 26 11:23:50 2022

gnode03[0]
     Mom = gnode03
     ntype = PBS
     state = free
     pcpus = 16
     resources_available.arch = linux
     resources_available.host = gnode03
     resources_available.hpmem = 0b
     resources_available.mem = 128284mb
     resources_available.ncpus = 16
     resources_available.ngpus = 2
     resources_available.scratch = @gnode03
     resources_available.vmem = 144634mb
     resources_available.vnode = gnode03[0]
     resources_assigned.accelerator_memory = 0kb
     resources_assigned.hbmem = 0kb
     resources_assigned.mem = 0kb
     resources_assigned.naccelerators = 0
     resources_assigned.ncpus = 0
     resources_assigned.ngpus = 0
     resources_assigned.vmem = 0kb
     resv_enable = True
     sharing = default_shared
     license = l
     last_state_change_time = Sun Nov 20 18:11:54 2022
     last_used_time = Thu Nov 17 08:50:24 2022

gnode03[1]
     Mom = gnode03
     ntype = PBS
     state = free
     pcpus = 16
     resources_available.arch = linux
     resources_available.host = gnode03
     resources_available.hpmem = 0b
     resources_available.mem = 128984mb
     resources_available.ncpus = 16
     resources_available.ngpus = 2
     resources_available.scratch = @gnode03
     resources_available.vmem = 145334mb
     resources_available.vnode = gnode03[1]
     resources_assigned.accelerator_memory = 0kb
     resources_assigned.hbmem = 0kb
     resources_assigned.mem = 0kb
     resources_assigned.naccelerators = 0
     resources_assigned.ncpus = 0
     resources_assigned.ngpus = 0
     resources_assigned.vmem = 0kb
     resv_enable = True
     sharing = default_shared
     license = l
     last_state_change_time = Sun Nov 20 18:11:54 2022
     last_used_time = Thu Nov 17 08:50:24 2022

The strange behavior is that it gets scheduled if I submit a job that requires all the ncpus of a node (16+16).

#PBS -l select=1:ncpus=32:scratch=10g

Also, if I submit a job requiring a complete vnode ncpus set (16), it works.

#PBS -l select=1:ncpus=16:scratch=10g

I’m running the PBS version: 22.05.11

adarsh · November 20, 2022, 6:48pm

matzmz:

#!/bin/bash
#PBS -l select=1:ncpus=20:scratch=10g
#PBS -q workq
#PBS -l walltime=00:10:00
#PBS -j oe
#PBS -N scratch

sleep 10m
The job gets queued, but it will never start. The job comment is updated with the following value:

Please note that you are asking 1 chunk with 20 cpus here, but maximum a chunk can have on your node configuratoin is 16
pbsnodes -av | grep resources_available.ncpus

Hence your job will remain in the queue

Please try the below script it should work

#!/bin/bash
#PBS -l select=2:ncpus=10:scratch=10g
#PBS -q workq
#PBS -l walltime=00:10:00
#PBS -j oe
#PBS -N scratch

Also, i am not sure about the below configuration of the vnodes
gnode03[0]

gnode03[1]

matzmz · November 20, 2022, 7:12pm

Thanks for the quick reply.

If I run the following job (requesting 1chunk composed by 20ncpus):

#!/bin/bash
#PBS -l select=1:ncpus=20
#PBS -q workq
#PBS -l walltime=00:10:00
#PBS -j oe

sleep 10m

It starts running correctly.

(base) bash-4.4$ qsub test_vnodes.sh
3282.xxxxxxxx

The requested ncpus are assigned, 16 from “vnode 0” and 4 from “vnode 1”.

(base) bash-4.4$ qstat -f 3282
Job Id: 3282.xxxxxxxx
    Job_Name = scratch
    Job_Owner = xxxxx@xxxxxxx
    resources_used.cpupercent = 0
    resources_used.cput = 00:00:00
    resources_used.mem = 0kb
    resources_used.ncpus = 20
    resources_used.vmem = 0kb
    resources_used.walltime = 00:00:00
    job_state = R
    queue = workq
    server = xxxxx
    Checkpoint = u
    ctime = Sun Nov 20 19:57:10 2022
    Error_Path =xxxxxxxx:/home/xxxxx/scratch.e3282
    exec_host = gnode01/0*0
    exec_vnode = (gnode01[0]:ncpus=16+gnode01[1]:ncpus=4)
    Hold_Types = n
    Join_Path = oe
    Keep_Files = n
    Mail_Points = a
    mtime = Sun Nov 20 19:57:14 2022
    Output_Path = xxxxxxx:/home/xxxxxx/scratch.o3282
    Priority = 0
    qtime = Sun Nov 20 19:57:10 2022
    Rerunable = True
    Resource_List.ncpus = 20
    Resource_List.nodect = 1
    Resource_List.place = free
    Resource_List.select = 1:ncpus=20
    Resource_List.walltime = 00:10:00
    stime = Sun Nov 20 19:57:12 2022
    session_id = 2319019
    jobdir = /home/xxxxxx
    substate = 42
    Variable_List = PBS_O_HOME=/home/xxxxx,PBS_O_LANG=en_US.UTF-8, (... truncated text)
    comment = Job run at Sun Nov 20 at 19:57 on (gnode01[0]:ncpus=16+gnode01[1]
        :ncpus=4)
    etime = Sun Nov 20 19:57:10 2022
    run_count = 1
    Submit_arguments = test_vnodes.sh
    project = _pbs_project_default
    Submit_Host = xxxxx

Every node is composed of 2 vnodes (0,1), each with 16ncpus.

The result of this command (filtered for 1 node) is

pbsnodes -av | grep resources_available.ncpus
     resources_available.ncpus = 0
     resources_available.ncpus = 16
     resources_available.ncpus = 16

I forgot to mention that the vnodes are created by the group hooks.

I have the same behavior on another cluster composed of nodes with 2 vnode (each 10 ncpus).

If I request a chunk of 16 ncpus, it is successfully scheduled and the ncpus resources are assigned from vnode01/vnode02. Also, in this cluster, the vnodes are created by the cgroup hook script.

Topic		Replies	Views
Resources_assigned.ncpus = 0 Users/Site Administrators	1	260	October 19, 2023
"Not Running: No available resources on nodes" even when every core is 'free' on cluster Users/Site Administrators	3	466	May 22, 2024
PBS Hook for managing scratch custom host level resource Developers	2	1212	January 3, 2021
Jobs always in pend status suddenly Users/Site Administrators	5	2000	October 28, 2019
Failed to assign resources when hypertheading is enabled Users/Site Administrators	3	910	April 22, 2021

Not Running: Insufficient amount of resource: ncpus

Related topics