vinay
August 8, 2022, 5:06am
1
We have 4 compute nodes each with one RTX2080 GPU installed on all.
I am trying to follow the procedure -
Method I am trying -
“Simple GPU Scheduling with Exclusive Node Access”
When I run command -
qmgr -c “create resource rtx2080 type=Boolean,flag=h”
It gives error as below -
qmgr obj=rtx2080 svr=default: Illegal attribute or resource value
qmgr: Error (15014) returned from server
Screenshot -
Any one can help.
adarsh
August 8, 2022, 6:27am
2
Please try this command (it is recommended to type it , sometimes copy paste has issues )
qmgr -c "create resource rtx2080 type=boolean,flag=h"
vinay
August 8, 2022, 6:48am
3
Hi Adarsh,
Thanks it worked. So I added the resource to a node named - b14
Below is output of pbsnodes -a for b14 - Where it shows resources_available.rtx2080 = True
b14
Mom = b14.shukra.aero.iitb.ac.in
ntype = PBS
state = free
pcpus = 32
resources_available.arch = linux
resources_available.host = b14
resources_available.mem = 32948660kb
resources_available.ncpus = 32
resources_available.rtx2080 = True
resources_available.vnode = b14
resources_assigned.accelerator_memory = 0kb
resources_assigned.hbmem = 0kb
resources_assigned.mem = 0kb
resources_assigned.naccelerators = 0
resources_assigned.ncpus = 0
resources_assigned.vmem = 0kb
resv_enable = True
sharing = default_shared
last_state_change_time = Wed Jul 6 09:00:56 2022
Now how do I submit cuda job to that node -
Tried as written in the document. but it gives error -
vinay
August 8, 2022, 7:34am
4
Hi @adarsh I followed your steps from here - PBS Single exection host run job using cpu include gpu - #2 by adarsh
And its working fine. I can submit a cuda job.
by doing
#PBS -l select=1:ncpus=1:ngpus=1
But now a strange line is coming in all output files -
/var/spool/pbs/mom_priv/jobs/187.b0.SC: line 14: =1: command not found
This has started to appear only after this ngpus resource addition. I restarted pbs service.
Do i need to check something.
adarsh
August 8, 2022, 10:42am
5
Thank you
Please share your job submission script or the line number 14 in your job submission script
vinay
August 9, 2022, 3:19am
6
Hi @adarsh
Below is the script -
#!/bin/bash
#PBS -N testing
#PBS -q big
#PBS -l select=1:ncpus=1:ngpus=1
#PBS -j oe
#PBS -V
#PBS -o log.out
cd $PBS_O_WORKDIR
cat $PBS_NODEFILE > ./pbsnodes
#$PROCS1=cat ./pbsnodes|wc -l
nvcc -o g.out hello-world.cu
$HOME/gpu/g.out
/bin/hostname
After commenting $PROCS1=cat ./pbsnodes|wc -l
The error has gone. But earlier it used to work without any error. Not sure why this line has issues now.
adarsh
August 9, 2022, 6:12am
7
Could you plase replace that line with these
PROCS1=$(cat $PBS_NODEFILE | wc -l)
You can remove this line
vinay
August 9, 2022, 11:57am
8
Hi @adarsh Thanks a lot . it worked.
1 Like