Hello everyone,
I have a small environment (1 head and 5 compute nodes) and I need to implement queue roles based on CPU request. According to CPU request, the job must be send to nodes 1, 2, and 3. And others to nodes 4 and 5.
I created a hook to classify the job according to CPU requests. The script hook works as expected, setting a designated queue based on CPU request rules. Also, I created two routing queues and set as destination queue the default workq, but when I specify the host on routing destination, the job stays on queue and does not start.
Below are queues configurations:
create queue workq
set queue workq queue_type = Execution
set queue workq acl_host_enable = False
set queue workq from_route_only = False
set queue workq resources_max.walltime = 240:00:00
set queue workq resources_min.walltime = 00:00:00
set queue workq enabled = True
set queue workq started = True
Create and define queue execq_s
create queue execq_s
set queue execq_s queue_type = Route
set queue execq_s acl_host_enable = False
set queue execq_s route_destinations = workq@node4
set queue execq_s enabled = True
set queue execq_s started = True
qstat -Qf:
Queue: workq
queue_type = Execution
total_jobs = 4
state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:4 Exiting:0 Begun
:0
acl_host_enable = False
from_route_only = False
resources_max.walltime = 240:00:00
resources_min.walltime = 00:00:00
resources_assigned.mem = 0mb
resources_assigned.mpiprocs = 124
resources_assigned.ncpus = 124
resources_assigned.nodect = 4
hasnodes = True
enabled = True
started = True
Queue: exeq_s
queue_type = Route
total_jobs = 0
state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0 Begun
:0
acl_host_enable = False
route_destinations = workq@node4
enabled = True
started = True
and tracejob
07/17/2024 14:24:26 S enqueuing into execq_s, state Q hop 1
07/17/2024 14:24:26 S Job Queued at request of guereta@lsmchnode01.cm.cluster, owner = guereta@lsmchnode01.cm.cluster, job name = md_npt, queue = execq_s
07/17/2024 14:24:26 S dequeuing from execq_s, state T
07/17/2024 14:24:26 A user=guereta group=starccm project=_pbs_project_default jobname=md_npt queue=execq_s ctime=1721237066 qtime=1721237066 etime=0 Resource_List.mem=10000mb Resource_List.mpiprocs=1 Resource_List.ncpus=1 Resource_List.nodect=1
Resource_List.place=free Resource_List.select=1:arch=linux:ncpus=1:mem=10000mb:mpiprocs=1 Resource_List.software=NAMD
07/17/2024 14:24:56 A Job rejected by all possible destinations
07/17/2024 14:24:56 S Job rejected by all possible destinations
Using route destination only the workq (without @node4), the routing works well and job starts, but to first available host.
I am sure that specified node4 has the enough resource.
I checked the PBS Admin Guide, and there is not further information about using this kinding of routing destinations. According to guide, it should work.
Could you please help me?
Thanks in advance,