Hi,
We are running version 23.06.06 of OpenPBS and the server core dumped on an apparent request to delete a job which was then rejected. Looks like the job was already deleted perhaps.
This was the error in the server log just before the core…
11/15/2023 09:13:30;0080;Server@pbssrv1;Job;3512920;delete job request received
11/15/2023 09:13:30;0008;Server@pbssrv1;Job;3512920.pbssrv1;Job to be deleted at request of pbsuser@headnode
11/15/2023 09:13:30;0008;Server@pbssrv1;Job;3512922.pbssrv1;Job to be deleted at request of pbsuser@headnode
11/15/2023 09:13:30;0008;Server@k-admin;Job;3512917.pbssrv1;Job to be deleted at request of pbsuser@headnode
11/15/2023 09:13:30;0008;Server@pbssrv1;Job;3512924.pbssrv1;Job to be deleted at request of pbsuser@headnode
11/15/2023 09:13:30;0080;Server@pbssrv1;Job;B<9D>#;Unknown Job Id
11/15/2023 09:13:30;0080;Server@pbssrv1;Svr;update_deljob_rply;job B<9D># has already been deleted from delete job list
11/15/2023 09:13:30;0080;Server@pbssrv1;Req;req_reject;Reject reply code=15001, aux=0, type=100, from pbssrv1@headnode
Here is a backtrace from the core dump (let me know if you would like the actual core file and how I can upload it to you guys):
warning: Section `.reg-xstate/7281’ in core file too small.
[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib64/libthread_db.so.1”.
Core was generated by `/usr/local/pkgs/openpbs/sbin/pbs_server.bin’.
Program terminated with signal SIGABRT, Aborted.
warning: Section `.reg-xstate/7281’ in core file too small.
#0 0x00007f9ec428cb8f in raise () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x7f9ec7047840 (LWP 7281))]
Missing separate debuginfos, use: yum debuginfo-install cyrus-sasl-lib-2.1.27-6.el8_5.x86_64 expat-2.2.5-11.0.1.el8.x86_64 glibc-2.28-225.0.4.el8_8.6.x86_64 gssproxy-0.8.0-21.el8.x86_64 keyutils-libs-1.5.10-9.el8.x86_64 krb5-libs-1.18.2-25.0.1.el8_8.x86_64 libblkid-2.32.1-42.el8_8.x86_64 libcom_err-1.45.6-5.el8.x86_64 libgcc-8.5.0-18.0.6.el8.x86_64 libical-3.0.3-3.el8.x86_64 libicu-60.3-2.el8_1.x86_64 libmount-2.32.1-42.el8_8.x86_64 libnsl2-1.2.0-2.20180605git4a062cf.el8.x86_64 libpq-13.5-1.el8.x86_64 libselinux-2.9-8.el8.x86_64 libstdc+±8.5.0-18.0.6.el8.x86_64 libtirpc-1.1.4-8.el8.x86_64 libxcrypt-4.1.1-6.el8.x86_64 nss_nis-3.0-8.el8.x86_64 openldap-2.4.46-18.el8.x86_64 openssl-libs-1.1.1k-9.el8_7.x86_64 pcre2-10.32-3.el8_6.x86_64 python3-libs-3.6.8-51.0.1.el8_8.2.x86_64 systemd-libs-239-74.0.6.el8_8.5.x86_64 zlib-1.2.11-21.el8_7.x86_64
(gdb) bt
#0 0x00007f9ec428cb8f in raise () from /lib64/libc.so.6
#1 0x00007f9ec425fea5 in abort () from /lib64/libc.so.6
#2 0x00007f9ec42cdda7 in __libc_message () from /lib64/libc.so.6
#3 0x00007f9ec42d509c in malloc_printerr () from /lib64/libc.so.6
#4 0x00007f9ec42d6c24 in _int_free () from /lib64/libc.so.6
#5 0x00000000004b063d in free_string_array (arr=0x239f66e0) at misc_utils.c:1346
#6 0x00000000004b1bba in free_string_array (arr=) at misc_utils.c:1344
#7 0x0000000000455665 in free_br (preq=0x239f6d50) at process_request.c:1658
#8 0x0000000000457abb in reply_send (request=request@entry=0x239f6d50) at reply_send.c:369
#9 0x0000000000457ec3 in req_reject (code=15001, aux=aux@entry=0, preq=preq@entry=0x239f6d50) at reply_send.c:536
#10 0x0000000000459bba in req_deletejob (preq=0x239f6d50) at req_delete.c:625
#11 0x0000000000455844 in process_request (sfds=20) at process_request.c:720
#12 0x00000000004c1dee in process_socket (sock=sock@entry=20) at net_server.c:510
#13 0x00000000004c1fca in wait_request (waittime=, priority_context=) at net_server.c:623
#14 0x000000000042749e in main (argc=, argv=0x7ffcc65460e8) at pbsd_main.c:1398