Thank you for the detail instruction.
1) GDB stack trace
First of all, I checked pid written in mom.lock because there are several core.xxx file in /var/spool/pbs/mom_priv.
# cat /var/spool/pbs/mom_priv/mom.lock
991
and then I got below stack trace (little bit long…).
I can find segmentation fault before typing bt full.
# gdb /opt/pbs/sbin/pbs_mom /var/spool/pbs/mom_priv/core.991
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-80.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /opt/pbs/sbin/pbs_mom...done.
[New LWP 991]
[New LWP 992]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/opt/pbs/sbin/pbs_mom'.
Program terminated with signal 11, Segmentation fault.
#0 0x00007fda3256e346 in __strcmp_sse2 () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-106.el7_2.8.x86_64 hwloc-libs-1.7-5.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.13.2-12.el7_2.x86_64 libcom_err-1.42.9-7.el7.x86_64 libgcc-4.8.5-4.el7.x86_64 libpciaccess-0.13.4-2.el7.x86_64 libselinux-2.2.2-6.el7.x86_64 libxml2-2.9.1-6.el7_2.3.x86_64 nss-softokn-freebl-3.16.2.3-14.2.el7_2.x86_64 numactl-libs-2.0.9-6.el7_2.x86_64 openssl-libs-1.0.1e-51.el7_2.7.x86_64 pcre-8.32-15.el7_2.1.x86_64 python-libs-2.7.5-39.el7_2.x86_64 xz-libs-5.1.2-12alpha.el7.x86_64 zlib-1.2.7-15.el7.x86_64
(gdb) bt full
#0 0x00007fda3256e346 in __strcmp_sse2 () from /lib64/libc.so.6
No symbol table info available.
#1 0x00007fda33e29ccb in hwloc_obj_cmp () from /lib64/libhwloc.so.5
No symbol table info available.
#2 0x00007fda33e29e87 in hwloc__insert_object_by_cpuset () from /lib64/libhwloc.so.5
No symbol table info available.
#3 0x00007fda33e4396e in summarize () from /lib64/libhwloc.so.5
No symbol table info available.
#4 0x00007fda33e44828 in hwloc_look_x86 () from /lib64/libhwloc.so.5
No symbol table info available.
#5 0x00007fda33e448a3 in hwloc_x86_discover () from /lib64/libhwloc.so.5
No symbol table info available.
#6 0x00007fda33e2c8bb in hwloc_topology_load () from /lib64/libhwloc.so.5
No symbol table info available.
#7 0x000000000043e46f in mom_topology () at mom_main.c:10437
ret = 0
topology = 0x1f00f20
xmlbuf = 0x0
xmllen = 32730
vtp = 0x0
__func__ = "mom_topology"
#8 0x0000000000425bef in dep_initialize () at linux/mom_mach.c:4820
__func__ = "dep_initialize"
#9 0x00000000004377f4 in initialize () at mom_main.c:1123
i = <optimized out>
avl = <optimized out>
ix = {root = 0x24f47300, keylength = 1476692478, dup_keys = 0}
hook_msg = '\377' <repeats 16 times>, '\000' <repeats 3156 times>
hook_buf = '\000' <repeats 512 times>
hook_input = {pjob = 0x24f47300, progname = 0x0, argv = 0x0, env = 0x0, vnl = 0x6f70732f7261762f, pid = 1882156143, jobs_list = 0x0}
hook_output = {reject_errcode = 0x0, last_phook = 0x0, fail_action = 0x0, progname = 0x0, argv = 0x0, env = 0x0, vnl = 0x0}
hook_errcode = 0
hook_rc = 0
last_phook = 0x0
hook_fail_action = 0
ret = <optimized out>
xxrp = {xrp = {recptr = 0x0, count = 0, key = '\000' <repeats 15 times>}, buf = '\000' <repeats 287 times>}
rp = 0x7ffd72a12ad0
none = "<unset>"
hostval = <optimized out>
char_in_cname = <optimized out>
__func__ = "initialize"
#10 0x000000000041a2ce in main (argc=1, argv=<optimized out>) at mom_main.c:9057
id = "mom_main"
tpp_conf = {node_type = 1, routers = 0x1ef0ce0, numthreads = 1, node_name = 0x1ef0cc0 "centos7:15003", auth_type = 1 '\001', get_ext_auth_data = 0x0,
validate_ext_auth_data = 0x0, compress = 0, tcp_keepalive = 1, tcp_keep_idle = 30, tcp_keep_intvl = 10, tcp_keep_probes = 3,
---Type <return> to continue, or q <return> to quit---
buf_limit_per_conn = 5000, force_fault_tolerance = 0}
errflg = <optimized out>
c = <optimized out>
rc = <optimized out>
stalone = <optimized out>
i = <optimized out>
dummyfile = <optimized out>
act = {__sigaction_handler = {sa_handler = 0x436ff0 <stop_me>, sa_sigaction = 0x436ff0 <stop_me>}, sa_mask = {__val = {88579, 0 <repeats 15 times>}},
sa_flags = 536870912, sa_restorer = 0x7fda3425f000}
ptr = 0x0
servername = <optimized out>
serverport = 874907448
recover = 0
time_state_update = 0
tryport = <optimized out>
rppfd = 7
privfd = -1
tval = {tv_sec = 1476921819, tv_usec = 750521}
myla = 6.9453353452667086e-310
nxpjob = <optimized out>
pjob = <optimized out>
configscriptaction = <optimized out>
inputfile = 0x0
scriptname = 0x0
prscput = <optimized out>
prswall = <optimized out>
fd = <optimized out>
ipaddr = <optimized out>
mygid = 0
optindinc = <optimized out>
do_mlockall = <optimized out>
hook_input = {pjob = 0x7ffd72a13ca0, progname = 0x7fda3405694e <_dl_map_object_from_fd+2526> "\213\025\f\264!", argv = 0x7ffd72a13cd0,
env = 0x7fda3405ae99 <_dl_add_to_namespace_list+25>, vnl = 0x0, pid = 2147479968, jobs_list = 0x7ffd72a13cd0}
path_hooks_rescdef = "/var/spool/pbs/mom_priv/hooks/resourcedef\000\005\064\332\177\000\000\000\000\000\000\000\000\000\000 =\241r\375\177", '\000' <repeats 14 times>, "\001\000\000\000\r\000\000\000\000\000\000\000\001\n&4\332\177\000\000\000\000\000\000\375\177\000\000\260\f&4\332\177", '\000' <repeats 26 times>, "P\n&4\332\177\000\000\000\000\000\000\375\177\000\000\260\f&4\332\177\000\000\240M\241r\375\177\000\000@\003\000\000\000\000\000\000\177ELF\002\001\001\000\000\000\000\000\000\000\000\000\003\000>\000\001\000\000\000"...
__func__ = "main"
__PRETTY_FUNCTION__ = "main"
(gdb) quit
2) Installation type
I installed pbspro from source . But my colleague tried RPM package producing based on this guide and he got same result.
3) Master and slave have same hostname
Sorry, you’re right. It’s typo. I fix it.
Best regards.