CPUs on GPU nodes

Hi,
I have a GPU node that also has CPU cores available.
Can CPUs be used for different jobs and GPUs for others at the same time?
OR should I create a separate queue for GPU nodes?

Yes, no problems

Not required , one queue is enough for both cpu and gpu jobs
For example: You have one compute node with 36cores and 1 GPU card, 256GB memory
or
one nodeA with 20 core
another nodeB with 10 cores and 1 GPU card

qsub -q workq -l select=1:ncpus=20:mem=100gb -- /bin/sleep 2000 # cpu only job
qsub -q workq -l select=1:ncpus=10:ngpus=1:mem=100gb -- /some/gpu_consuming_application  # gpu job, will go to gpu node

Hello,
Thank you for Your answer.
I meant the case when:

qsub -q workq -l select=1:ncpus=20:mem=100gb -- /bin/sleep 2000 # cpu only job
qsub -q workq -l select=1:ncpus=5:ngpus=100:mem=100gb -- /some/gpu_consuming_application  # gpu job, will go to gpu node
qsub -q workq -l select=1:ncpus=5:mem=1gb -- /bin/sleep 2000 # is this goes to gpu node?

or

qsub -q workq -l select=1:ncpus=20:mem=100gb -- /bin/sleep 2000 # cpu only job
qsub -q workq -l select=1:ncpus=0:ngpus=1:mem=100gb -- /some/gpu_consuming_application  # 
qsub -q workq -l select=1:ncpus=10:mem=1gb -- /bin/sleep 2000 # is this goes to gpu node?

Will all jobs start or will some have to wait until the ones on the GPU finish?

If there is 1 node in the cluster with 30 cores (ncpus) and 300GB Memory and 100 gpus cards (ngpus)

  • then the first and second set of qsub’ed jobs will go to this single node

pbsnodes | grep -e ncpus -e ngpus -e mem
resources_available.mem = 300gb
resources_available.ncpus = 30
resources_available.ngpus = 100

if you have 2 nodes in the cluster with one node having 100 ngpus

  • then the job with qsub request :ngpus=1 or ngpus=100 will go only to the gpu node
  • the job with qsub request that does not contain ngpu’s resource request, can run on either of the nodes.

If you have enough resources on the compute nodes satisfying all thes jobs , then they all can run in parallel

If you have a compute node with 30 cores , 300GB Mem, 100 GPUs - then either set of qsub’s that is mentioned above can run in parallel on the same node.

1 Like