The question is related to: Exclude the node from qsub
Is there a way, except from creating resources, to submit a job to one of a list of possible nodes/hosts?
If I use qsub -V -l nodes=host5
I can select one specific node and submit jobs to it that run in parallel, but if I use qsub -V -l nodes=host5+host6+host7
, only one job is running while the others are queued.
qsub -l host= # to run one specific node
qsub -l nodes=nodename+nodename+nodename # to run on these specific nodes
If you want one of the specific node from the list of nodes, that is free to run the job, then custom resources (qlists method explained in the other link) are required.
Could you please explain this in detail by providing
- pbsnodes -av output
- qstat -wans1 output
- qstat -f < one of the queued job >
qstat -wans1:
10602.vcl-pbs2 user gpu_low task1.sh 18954 3 3 -- -- R 00:01:18 vcl-gpu8/0+vcl-gpu7/2+vcl-gpu6/1
Job run at Mon May 14 at 08:19 on (vcl-gpu8:ncpus=1:ngpus=1)+(vcl-gpu7:ncpus=1:ngpus=1)+(vcl-gpu6:ncpus=1:ngpus=1)
10603.vcl-pbs2 user gpu_low task2.sh -- 3 3 -- -- Q -- --
Not Running: Insufficient amount of resource: ngpus
10604.vcl-pbs2 user gpu_low task3.sh -- 3 3 -- -- Q -- --
Not Running: Insufficient amount of resource: ngpus
qstat -f:
comment = Not Running: Insufficient amount of resource: ngpus
Thank you.
From the shared information, it seems there aren’t sufficient ngpus available to run that job(s). This is the first message / information that the scheduler has encountered mentioning why it cannot run that job now.
The message seems clear, but why is it then working if I submit the jobs with `nodes=host5~, i.e., specifying only one node?
Could you please share the current status of the system by sharing the below
pbsnodes -av
qstat -answ1
qstat -fx
Also, can you try to run the below job, if node5 is occupied:
qsub -l select=1:ncpus=1:host=node6+1:ncpus=1:host=node7 – /bin/sleep 100
You can increase the scheduler log verbosity to max and trace the activity of the scheduler by invoking a new scheduling cycle .
How do I do if I want to submit alternatively to one of the three nodes: host5, host6 or host7 ? If I specifify this:
qsub -l nodes=host5,nodes=host6,nodes=host7
Will it submit to host5 or alternatively to host6 (or host7) when there is no more available resources on host5 (or host6) ?
[Answer]: No, you need to create qsub wrapper script based on the logic of selecting one of the nodes, which will take the inputs from user and submit the job alternatively.
Please follow this link: PP-506,PP-507: Add support for requesting resources with logical 'or' and conditional operators - #42 by arungrover