I initially appended this to my other issue “PBS server core dump on apparent job delete” but now that I have a full core dump it is clear the issue is a little different so I am creating a new topic.
OS: Oracle Linux 8.8 running RHEL kernel 4.18.0-477.27.1.el8_8.x86_64
PBS Server: 23.06.06 with patch b1249644 installed
Segfault occurred when a user was trying to delete a job, but is not consistently reproducible. I have not been able to repeat it thus far, though it has happened twice, a few weeks apart, and both times it was the same user curiously.
Core dump info:
Core was generated by `/usr/local/pkgs/openpbs/sbin/pbs_server.bin’.
Program terminated with signal SIGSEGV, Segmentation fault.
warning: Section `.reg-xstate/1261205’ in core file too small.
#0 0x00007f67a2e4af17 in __strlen_avx2 () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x7f67a5b88840 (LWP 1261205))]
Missing separate debuginfos, use: yum debuginfo-install cyrus-sasl-lib-2.1.27-6.el8_5.x86_64 expat-2.2.5-11.0.1.el8.x86_64 glibc-2.28-225.0.4.el8_8.6.x86_64 gssproxy-0.8.0-21.el8.x86_64 keyutils-libs-1.5.10-9.el8.x86_64 krb5-libs-1.18.2-25.0.1.el8_8.x86_64 libblkid-2.32.1-42.el8_8.x86_64 libcom_err-1.45.6-5.el8.x86_64 libgcc-8.5.0-18.0.6.el8.x86_64 libical-3.0.3-3.el8.x86_64 libicu-60.3-2.el8_1.x86_64 libmount-2.32.1-42.el8_8.x86_64 libnsl2-1.2.0-2.20180605git4a062cf.el8.x86_64 libpq-13.5-1.el8.x86_64 libselinux-2.9-8.el8.x86_64 libstdc+±8.5.0-18.0.6.el8.x86_64 libtirpc-1.1.4-8.el8.x86_64 libxcrypt-4.1.1-6.el8.x86_64 nss_nis-3.0-8.el8.x86_64 openldap-2.4.46-18.el8.x86_64 openssl-libs-1.1.1k-9.el8_7.x86_64 pcre2-10.32-3.el8_6.x86_64 python3-libs-3.6.8-51.0.1.el8_8.2.x86_64 systemd-libs-239-74.0.6.el8_8.5.x86_64 zlib-1.2.11-21.el8_7.x86_64
(gdb) up
#1 0x00007f67a2de790d in vfprintf () from /lib64/libc.so.6
(gdb) up
#2 0x00007f67a2e0e044 in vsnprintf () from /lib64/libc.so.6
(gdb) up
#3 0x00007f67a2dee083 in snprintf () from /lib64/libc.so.6
(gdb) up
#4 0x000000000045986f in req_deletejob (preq=0x16fe32a0) at req_delete.c:617
617 snprintf(jid, sizeof(jid), “%s”, jobids[j]);
(gdb) up
#5 0x0000000000455844 in process_request (sfds=19) at process_request.c:720
720 dispatch_request(sfds, request);
(gdb) up
#6 0x00000000004c1eae in process_socket (sock=sock@entry=19) at net_server.c:510
510 svr_conn[idx]->cn_func(svr_conn[idx]->cn_sock);
(gdb) up
#7 0x00000000004c208a in wait_request (waittime=, priority_context=) at net_server.c:623
623 if (process_socket(em_fd) == -1) {
(gdb) up
#8 0x000000000042749e in main (argc=, argv=0x7ffdd1da8988) at pbsd_main.c:1398
1398 if (wait_request(waittime, priority_context) != 0) {
Reported in the server log at time of crash:
04/17/2024 19:10:54;0080;Server@pbssrv1;Job;3963228.pbssrv1;delete job request received
04/17/2024 19:10:54;0008;Server@pbssrv1;Job;3963228.pbssrv1;Job to be deleted at request of user4@login3