I am running a job using a single execution host, where 20 processors and gpu Nvidia RTX5000 CUDA and on 32 gb ram machine. When i use PBS -l select=1:ncpus=7:mpiprocs=7:mem=14GB,ngpus=1, error unknown resource ngpus is showing. job running on cpu processers only, but in want to use both cpu and gpu. I need to increase the processing speed by assigning cpu and gpu both together to run the job. when I check in qstat -Bf there in the resource ngpus has not appeared. how to use gpu for numerical model simulation.
Please create a ngpus host-level resource and configure the node as below
- qmgr -c “create resource ngpus type=long,flag=nh”
- Add ngpus to the resources: line of the $PBS_HOME/sched_priv/sched_config file
eg: resources: “ncpus, aoe, …,ngpus”
- kill -HUP
- qmgr -c “set node NODENAME resources_available.ngpus=1”
NODENAME = replace with your compute node hostname
Here, i have assigned it to 1 , it means one gpu card, if you have more assign accordingly
Please note defining and requesting resources via qsub in PBS Pro helps scheduling of jobs on to the compute nodes ( requests – matchmaking – on to available resources at that time) . PBS Pro does not enforce the underlying applications to use 1 ncpu and 1 ngpu. The application that you use, should be capable of utilizing both cpu(s) and gpu(s). The request statement via qsub is to help PBS Pro what kind of resource your job requires to run and PBS Pro searches such resources in your cluster and schedules job on to them.
qsub -l select=1:ncpus=1 - - /path/to/myapplication/runprogram -np 1 -ngpus 1 -input inputfile.fem
Here i have not requested the gpu, but in the application batch command line, i have asked the application to use the GPU. PBS Pro would still run this job and the application can utilise gpu(s).
Here PBS Pro does not know whether this job has occupied the gpu on that node.
qsub -l select=1:ncpus=1:ngpus=1 - - /path/to/myapplication/runprogram -np 1 -ngpus 1 -input inputfile.fem
Here PBS Pro knows , that this request running on the node is using that gpu, so any further request of this gpu will put the job in the queue, until this job is finished.
when i tried to ngpus host-level resource as you mentioned error 15007 showing
$qmgr -c ‘create resource ngpus type=long,flag=nh’
qmgr obj=ngpus svr=default: Unauthorized Request
qmgr: Error (15007) returned from server
Please try the above commands as root user, also please type it in, copy paste adds some special characters for double quotes " or single quote ’
in root # ./opt/pbs/bin/qmgr -c ‘create resource ngpus type=long,flag=h’
qmgr obj=ngpus svr=default: Duplicate entry in list
qmgr: Error (15055) returned from server
quotes i given manually. still the error was showing
/# ./opt/pbs/bin/qmgr -c ‘set node workstation resources_available.ngpus=1’
qmgr obj=workstation svr=default: Unknown node
qmgr: Error (15062) returned from server
ngpus custom resource already added to the PBS Server.
qmgr: print resource ngpus # should give you the result
The compute node should be part of /etc/hosts of PBS Server and itself.
It should be resolvable and should have a static IP address and a resolvable hostname (reverse resolvable as well).