Hi,
I want to schedule a few jobs on a system with GPUs. Some of these jobs require exclusive access to the gpu on the system and some don’t. While the GPU intensive job is running, I want to schedule the CPU job to achieve a higher throughput of jobs. How do I schedule these kinds of jobs concurrently but make sure that two GPU jobs don’t run at the same time?
I have followed “Chapter 5 Allocating Resources & Placing Jobs” of User guide to create ngpus as a resource, I can see these fields in pbsnodes -v of any node
resources_available.arch = windows
resources_available.host = wmep09
resources_available.mem = 50322552kb
resources_available.naccelerators = 2
resources_available.ncpus = 24
resources_available.ngpus = 2
resources_available.vnode = wmep09
resources_assigned.accelerator_memory = 0kb
resources_assigned.hbmem = 0kb
resources_assigned.mem = 0kb
resources_assigned.naccelerators = 0
resources_assigned.ncpus = 5
resources_assigned.ngpus = 0
resources_assigned.vmem = 0kb
resv_enable = True
But when I schedule any test with -l select=1:host=wmep09:ncpus=1:ngpus=1, I always see resources_assigned.ngpus=0, it doesnt change to 1. What’s happening here? I have followed the static way to create ngpus resource.
exclusive access to the node : use -l place=excl
if you would like the shared access : -l place=free or -l place=pack
Until the resources (cpu, gpu, memory, customresources) are free on that compute node. PBS will schedule jobs on to those nodes unless the job has not requested exclusive access to that node.
To make sure TWO gpu jobs does not run on the same host at the same (if in case 2 gpus exist on that node),
you can have a custom resource gpu_count
Case 1:
Case 2:
You do not want to run two GPU jobs concurrently, hence reduce the number gpu resource on that node to 1
qmgr -c " set node NODENAME resources_available.ngpus=1 "
qmgr -c “create resource ngpus type=long,flag=hn”
add the “ngpus” resource to the resources: “ncpus, aoe, … , ngpus” line of the sched_config
kill -HUP
now submit the job -l select=1:host=wmep09:ncpus=1:ngpus=1 , when this job is running the resources_assigned.ngpus=1 will be populated.
Hi, I tried the steps mentioned in case 1, I still dont see it care about gpu resource at all
Also, what is the difference between ngpus and gpu_count?
drao$ qmgr -c "set node wmep13 resources_available.gpu_count=1"
drao$ echo "sleep 3600" | qsub -l select=1:host=wmep13:ncpus=2:ngpus=1:gpu_count=1
14831.sched-win
drao$ pbsnodes -vSjL wmep13
mem ncpus nmics ngpus
vnode state njobs run susp f/t f/t f/t f/t jobs
--------------- --------------- ------ ----- ------ ------------ ------- ------- ------- -------
wmep13 free 1 1 0 24gb/24gb 22/24 0/0 1/1 14831 <-ngpus still shows 1/1, but ncpus shows 22/24
drao$ qstat -f 14831
Job Id: 14831.sched-win
Job_Name = STDIN
Job_Owner = drao@sched-win.abc.net
resources_used.cput = 00:00:00
resources_used.mem = 0kb
resources_used.walltime = 00:00:07
job_state = E
queue = workq
server = sched-win
Checkpoint = u
ctime = Fri Jul 20 13:46:06 2018
Error_Path = sched-win.abc.net:C:/Users/drao/Documents/PBS Pro/STDIN.e14831
exec_host = wmep13/0*2
exec_vnode = (wmep13:ncpus=2) <- it doesn't show gpus here
...
This is the output of qmgr -c “p s”
# Create resources and set their properties.
#
#
# Create and define resource ngpus
#
create resource ngpus
set resource ngpus type = long
set resource ngpus flag = hn
#
# Create and define resource gpu_count
#
create resource gpu_count
set resource gpu_count type = long
set resource gpu_count flag = hn
#
# Create queues and set their attributes.
#
#
# Create and define queue workq
#
create queue workq
set queue workq queue_type = Execution
set queue workq enabled = True
set queue workq started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_hosts = sched-win
set server acl_roots = drao
set server default_queue = workq
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server resources_default.ncpus = 1
set server default_chunk.ncpus = 1
set server scheduler_iteration = 600
set server flatuid = True
set server resv_enable = True
set server node_fail_requeue = 310
set server max_array_size = 10000
set server single_signon_password_enable = True
set server pbs_license_min = 0
set server pbs_license_max = 2147483647
set server pbs_license_linger_time = 31536000
set server eligible_time_enable = False
set server job_history_enable = True
set server max_concurrent_provision = 5
Please check and share the output of pbsnodes wmep13, when you run this
echo “sleep 3600” | qsub -l select=1:host=wmep13:ncpus=2:ngpus=1:gpu_count=1
Also, please share the output of qstat -fx and tracejob
resources_available.ngpus =2 ( total number of gpus available on that node )
resources_available.gpu_count=1 ( your requirement was to run only one gpu job per node and no concurrent gpu jobs are allowed , hence this is for concurrency check )
Please share your sched_config file , are you sure you restarted PBS Services or executed kill -HUP <PID of the PBS Scheduler ) , after adding the custom resources to the "resources: " line of the sched_config
My test run shows below:
[root@pbspro ~]# qstat -fx 22877 | grep exec
exec_host = gn001/2
exec_vnode = (gn001:ncpus=1:ngpus=1)
I restarted PBS services and also rebooted the machine running PBS services after adding the custom resources to the resources line.
This is the output of echo “sleep 3600” | qsub -l select=1:host=wmep13:ncpus=2:ngpus=1:gpu_count=1, qstat -fx and tracejob.
07/23/2018 12:43:04 S Job Queued at request of drao@sched-win.abc.net, owner = drao@sched-win.abc.net, job name = STDIN, queue = workq
07/23/2018 12:43:04 S Job Run at request of Scheduler@sched-win.abc.net on exec_vnode (wmep13:ncpus=2)
07/23/2018 12:43:04 L Considering job to run
07/23/2018 12:43:04 L Job run
07/23/2018 12:43:05 S Obit received momhop:1 serverhop:1 state:4 substate:42
sched_config looks like this (I have removed all the comments to keep it short)
round_robin: False all
by_queue: True prime
by_queue: True non_prime
strict_ordering: false ALL
help_starving_jobs: true ALL
max_starve: 24:00:00
backfill_prime: false ALL
prime_exempt_anytime_queues: false
primetime_prefix: p_
nonprimetime_prefix: np_
node_sort_key: “sort_priority HIGH” ALL
provision_policy: “aggressive_provision”
sort_queues: true ALL
resources: “ncpus, mem, arch, host, vnode, aoe, eoe, ngpus, gpu_count”
load_balancing: false ALL
smp_cluster_dist: pack
fair_share: false ALL
fairshare_usage_res: cput
fairshare_entity: euser
fairshare_decay_time: 24:00:00
fairshare_decay_factor: 0.5
preemptive_sched: true ALL
preempt_queue_prio: 150
preempt_prio: “express_queue, normal_jobs”
preempt_order: “SCR”
preempt_sort: min_time_since_start
dedicated_prefix: ded
log_filter: 3328
My apologies, it is windows system ( your job has exited with non zero ), please use the below qsub request.
qsub -l select=1:ncpus=2:ngpus=1:gpu_count=1 – pbs_sleep 1000
Note: sleep is not available on windows, hence we have to use the pbs_sleep command available in the BIN directory of PBS Pro deployment.
Thank you drao for your patience and persistence.
I am not seeing this behavior on the Linux system with the exact same configuration.
I would have to check with @hirenvadalia and @agrawalravi90 on this to get their comments.
Also, request you to try the qsub command without using host=wmep13 in the chunk statement:
@drao and @adarsh
I just tested on windows and linux using pbspro master branch and its working fine for me, I can see ngpus is listed in exec_vnode in qstat -f as well as resources_assigned.ngpus is also getting updated in pbsnodes -av.