Mom start has python issues

anyone know why at job start time I might get python issues like:

03/14/2023 15:01:19;0001;pbs_python;Svr;pbs_python;PBS server internal error (15011) in pbs_python_load_python_types, <class ‘ImportError’>
03/14/2023 15:01:19;0001;pbs_python;Svr;pbs_python;PBS server internal error (15011) in pbs_python_load_python_types, /opt/python/lib/python3.7/lib-dynload/math.cpython-37m-x86_64-linux-gnu.so: undefined symbol: PyFloat_Type
03/14/2023 15:01:19;0001;pbs_python;Svr;pbs_python;pbs_python_ext_start_interpreter, could not load python types into the interpreter
03/14/2023 15:01:19;0001;pbs_mom;Svr;pbs_mom;run_hook, execv of /opt/pbs/bin/pbs_python resulted in nonzero exit status=1

thanks
s

oh and version:
/opt/pbs/bin/pbs_python --version
pbs_version = 20.0.1

This looks like pbs_python was built against a different version of python than it is finding at runtime. What do you get if you run the following on a MoM?

ldd /opt/pbs/bin/pbs_python

Do you get something different if you unset PYTHONPATH first?

Did you build PBS yourself? There is a StackOverflow article that might apply:

this is my initial attempts at building openpbs from source on a rocky 8 machine.
it comes with python 3.6 installed I also dnf install python3.9. but as openpbs v20 only likes up to python3.7 I do build the latest python3.7 from source and point at it during config time when building openpbs.
how all that combines when the mom actually starts is beyond me and obviously not quite working.

Im currently trying to switch to openpbs v22 as it likes newer python versions. which requires the PBS server to also be at v22 so its a bit more involved.

there are nice binary rpms for rocky 8 and openpbs which is generally ok for the mom side. On the server I have to build from source as I have to tweak the openpbs code so cant use rpms. at which point figure I should also build from source on the execution hosts so things get complicated…

I dont currently have an openpbs v20 machine built to see what the ldd /opt/pbs/bin/pbs_python returns but will dump that in the issue when I get a chance.

thanks dale, (I thought you were supposed to be retired…)

ldd /opt/pbs/bin/pbs_python
linux-vdso.so.1 (0x00007ffdef28e000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f6dbc4e2000)
libutil.so.1 => /lib64/libutil.so.1 (0x00007f6dbc2de000)
libm.so.6 => /lib64/libm.so.6 (0x00007f6dbbf5c000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f6dbbd58000)
libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007f6dbbb2f000)
libc.so.6 => /lib64/libc.so.6 (0x00007f6dbb769000)
/lib64/ld-linux-x86-64.so.2 (0x00007f6dbc702000)

unsetting PYTHONPATH did not change anything.

Forgive my ignorance, but why didn’t you use the python 3.6 that comes installed by default?

What does the following yield:

# nm /opt/pbs/bin/pbs_python | grep PyFloat_Type
0000000000699400 B PyFloat_Type

If that article I linked to explains what is going on, I would expect a different result on your system.

I’m just flailing around here. Perhaps someone with more knowledge could chime in?

mostly I wanted a newer version of python (in addition to the required 3.6) on the machine.
I have rolled back to not adding any newer python until after PBS is built which is helping.
now for some strange reason a qsub hook fails with a module not found error for a python package
that isnt imported in the hook but is installed even if it did.
things are never simple here…
as all this is buried in automation scripts testing something post tweak takes an hour to build up the
environment so testing is slow…

thanks
s

followup here in case others have issues:
so Rocky Linux 8 is stuck at python3.6. some of the packages installed to support PBS will mess with other packages depending on the order in which things are installed. I suspect as those packages have version number requirements on packages they require.
When hooks stop working you will want to make sure you re-install any python package that hook
(and internal PBS python usage) uses.
And if you are in AWS reinstall the awscli.