Low GPU utilization with PBS job

I am running Matlab script for deep learning (convolutional neural network) on the Uni HPC (PBS job) using the standalone application. currently, I am facing a real problem that I don’t know what is the reason(s) behind it:
I ran the same script (on the same dataset) a couple of times before using the GPU and it was pretty fast but now it seems like its running using CPU because it’s so slow and takes lots of time. for instance: one run completed 90 Epochs in 40 hours, however now it takes 48 hours to finish only 2 Epochs. The IT guy told me that my job is running on GPU node and the utilization is so low, most of the time is zero. one extra thing, I haven’t changed anything in the wrapper shell text as well as any choice of (CPU over GPU) in the preferred running environment from the original Matlab script. I am quite new to the PBS and have very limited access to the cluster (User with sufficient privileges to run my scripts only). the cluster is running Centos 6.
here is a sample of my wrapper text:

#!/bin/bash
#PBS -l nodes=1:ppn=1,walltime=190:00:00
#PBS -q gpu
#PBS -m abe

export CUDA_VISIBLE_DEVICES=cat /tmp/$PBS_JOBID/gpu

/export/home/2164104a/run_withflip2.sh /export/home/2164104a/MATLAB2017b_Compiler_Runtime/v93/

I would be so grateful if someone has come across such a problem before or can give me any advice.

cheers

Welcome to the community. As for your question, can you provide more information on the jobs that ran fast? Was it the exact same script? Did the job run on fast on a cpu node and now it is running slow on a GPU node?

Thank you a lot for your response. the same script that has been running quite fast, now is running too slow ( as if it’s running on CPU, not a GPU). so in summery, its the exact same script and exact same node (GPU node).
please let me know if you need any further information

Here is what I would check in the rough order of what I think is most to least likely:

  1. PBS always sets OMP_NUM_THREADS, some applications behave differently, possibly slower, when this is set. Try unsetting it from the job script before the application is launched to see if this makes a difference.

  2. Check the kernel limits on the processes (STACK, MEMLOCK, etc.) both inside the job environment and in the environment where it performs better.

  3. “Other” environment variables can have an effect as well, but beyond OMP_NUM_THREADS and TMPDIR it is really application specific. Usually in that case something is missing from the job environment that the application depends on. You can try submitting with qsub -V from the environment where performance is good as a quick test to see if this matters and narrow it down from there if it does.

  4. I have seen application behavior problems because of what $TMPDIR is set to inside of the job. Again, compare $TMPDIR in both environments and see if it matters.

  5. I have never actually seen the mom polling tank an applications performance, but one could adjust the mom’s $min_check_poll and $max_check_poll so that polling is less frequent.

Disclaimer: I don’t remember specifically ever looking into anything like this when a GPU was involved.

HTH.

thanks a lot for your reply @scc :ok_hand: