I am running Matlab script for deep learning (convolutional neural network) on the Uni HPC (PBS job) using the standalone application. currently, I am facing a real problem that I don’t know what is the reason(s) behind it:
I ran the same script (on the same dataset) a couple of times before using the GPU and it was pretty fast but now it seems like its running using CPU because it’s so slow and takes lots of time. for instance: one run completed 90 Epochs in 40 hours, however now it takes 48 hours to finish only 2 Epochs. The IT guy told me that my job is running on GPU node and the utilization is so low, most of the time is zero. one extra thing, I haven’t changed anything in the wrapper shell text as well as any choice of (CPU over GPU) in the preferred running environment from the original Matlab script. I am quite new to the PBS and have very limited access to the cluster (User with sufficient privileges to run my scripts only). the cluster is running Centos 6.
here is a sample of my wrapper text:
#!/bin/bash
#PBS -l nodes=1:ppn=1,walltime=190:00:00
#PBS -q gpu
#PBS -m abe
export CUDA_VISIBLE_DEVICES=cat /tmp/$PBS_JOBID/gpu
/export/home/2164104a/run_withflip2.sh /export/home/2164104a/MATLAB2017b_Compiler_Runtime/v93/
I would be so grateful if someone has come across such a problem before or can give me any advice.
cheers