Trying to get CUDA_VISIBLE DEVICES set with hook

Hi Joe,

Thanks for your questions. The message you’re seeing about libmemacct is benign. If you were on an Altix system where libmemacct.so is present, you wouldn’t see it. It may safely be ignored.

The cgroup hook sets CUDA_VISIBLE_DEVICES in the job’s environment. Your configuration looks correct, but some versions of nvidia-smi report information about the device IDs differently. There is another thread where this was discussed here: GPU Access Limited by CGroup

Take a look at the pbs_mom logs in /var/spool/pbs/mom_logs and see if there is anything helpful there. If not, you can increase the verbosity of the logs by adding a line to /var/spool/pbs/mom_priv/config that looks like this:
$logevent 0xffff

You will need to restart pbs_mom so that it rereads its configuration. Try running another job and see if the logs provide any clues. Feel free to post excerpts here if you need additional help.

Thanks,

Mike