We are experiencing an issue where when we try and load modules within a PBS script, we are getting “command not found” for the module command. See below.
[randerson10@login0002 Python_examples]$ qsub basic_python.pbs
40.wwadm01
[randerson10@login0002 Python_examples]$ cat test.e40
/var/spool/pbs/mom_priv/jobs/40.wwadm01.SC: line 14: module: command not found
/var/spool/pbs/mom_priv/jobs/40.wwadm01.SC: line 18: python: command not found
We are able to run the module command from the shell with regular users on the compute nodes and all our software shows up and loads flawlessly.
[randerson10@node0001 ~]$ module load go
[randerson10@node0001 ~]$ module list
Currently Loaded Modules:
python/3.8.6-intel-uly7 2) go/1.19.1
Both $PATH’s for regular users and root contain the locations for our modules.
Is there any explanation for this behavior? Modules unable to load when trying to be loaded by scripts
It appears the shell environment is not being set as expected for your job. My guess is that the shell startup files are not being run. To help diagnose this, edit your basic_python.pbs script so that the first executable lines are
/usr/bin/ps -fu randerson10
/usr/bin/env
Also, just to verify which module command you are expecting, on your terminal run
Adding /usr/bin/ps -fu randerson10 and /usr/bin/env had not affect.
The module --version output just is the following. module --avail, list, load is what is expected.
Modules based on Lua: Version 8.7.32 2023-08-28 12:42 -05:00
by Robert McLay mclay@tacc.utexas.edu
I am using OpenHPC with LMOD, and the lmod.sh withing /etc/profile.d/ file is below. It looks alright to my eye and it has not been messed with so it is the default that comes with.
So running an interactive job on the node is not working as you thought.
I am unable to load any modules because the module command is not available.
I tried putting source /opt/ohpc/admin/lmod/lmod/init/bash >/dev/null but that is not fixing the problem of initializing the module environment.
The lmod.sh script I referenced above was only on the login node, but placing it on the compute nodes now allows me to run an interactive job and then the module command is available. However, when submitting a non-interactive job, my job script error output still comes back with module command not found.
My suggestion: Make a copy of /etc/profile.d/lmod.sh, but remove the three lines mentioned above:
if [ -n “$SLURM_NODELIST” ] || [ -n “$PBS_NODEFILE” ]; then
return
fi
Then, at the start of your pbs script source that copy:
source my_lmod.sh
What is happening is that lmod is expecting PBS to forward your entire environment when you qsub the job. This is not default behavior, but you can force it by adding -V to the qsub arguments. However, because no other shell startup scripts have a similar behavior, you’ll end up with a mixture of the qsub environment and the node environment.