I’m currently facing an issue where jobs submitted to specific queues (e.g., workq
) are being scheduled on Vnodes that shouldn’t be associated with those queues, despite my attempt to restrict them using the queue_list
resource.
I have configured pbs with the following configurations:
sched_config:
[foo]# cat ./sched_priv/sched_config | grep queue_list
resources: "ncpus, mem, arch, host, vnode, aoe, eoe, ngpus, queue_list, scratch, nodetype"
resource:
[foo]# qmgr -c 'p r queue_list'
create resource queue_list
set resource queue_list type = string_array
set resource queue_list flag = h
queue parameters:
[foo]# qmgr -c 'p q @def' | grep queue_list
set queue workq resources_default.queue_list = workq
set queue debug resources_default.queue_list = debug
set queue longq resources_default.queue_list = longq
set queue prioq resources_default.queue_list = prioq
set queue workq default_chunk.queue_list = workq
set queue debug default_chunk.queue_list = debug
set queue longq default_chunk.queue_list = longq
set queue prioq default_chunk.queue_list = prioq
vnode parameters:
[foo]# qmgr -c 'p n @def' | grep queue_list
set node cnode01 resources_available.queue_list = workq
set node cnode01 resources_available.queue_list += debug
set node cnode01 resources_available.queue_list += longq
set node cnode01[0] resources_available.queue_list = workq
set node cnode01[0] resources_available.queue_list += debug
set node cnode01[0] resources_available.queue_list += longq
set node cnode01[1] resources_available.queue_list = workq
set node cnode01[1] resources_available.queue_list += debug
set node cnode01[1] resources_available.queue_list += longq
set node cnode02 resources_available.queue_list = workq
set node cnode02 resources_available.queue_list += debug
set node cnode02 resources_available.queue_list += longq
set node cnode02[0] resources_available.queue_list = workq
set node cnode02[0] resources_available.queue_list += debug
set node cnode02[0] resources_available.queue_list += longq
set node cnode02[1] resources_available.queue_list = workq
set node cnode02[1] resources_available.queue_list += debug
set node cnode02[1] resources_available.queue_list += longq
set node anode01 resources_available.queue_list = prioq
set node anode01[0] resources_available.queue_list = prioq
set node anode01[1] resources_available.queue_list = prioq
I followed section 4.9.2.2 of the official documentation (“Example of Associating Multiple Vnodes with Multiple Queues”) to implement this setup.
Despite the configuration above, jobs submitted to workq
are still being scheduled on anode01
(which is only associated with prioq
). It seems that the queue_list
restrictions are being ignored by the scheduler.
I would really appreciate any insight into what might be going wrong.
Is there something I’m missing or misconfiguring?
Thank you in advance for your support and guidance!