Segfault in mgr_node_set when changing child vnode status

When changing a lot of node attributes in batch (status from offline to online and clearing ‘comments’) the server crashed with a segfault at pnode = psvrmom->msr_children[momidx]; in req_manager.c:2145 when it got to the one node we have that currently has child vnodes.

OS: Oracle Linux 8.8 running RHEL kernel 4.18.0-477.27.1.el8_8.x86_64
PBS Server: 23.06.06 with patch b1249644 installed

Info from core dump:

Core was generated by `/usr/local/pkgs/openpbs/sbin/pbs_server.bin’.
Program terminated with signal SIGSEGV, Segmentation fault.

warning: Section `.reg-xstate/1211784’ in core file too small.
#0 0x0000000000463e2d in mgr_node_set (preq=0x7276ab0) at req_manager.c:2145
2145 pnode = psvrmom->msr_children[momidx];
[Current thread is 1 (Thread 0x7f45e9323840 (LWP 1211784))]
Missing separate debuginfos, use: yum debuginfo-install cyrus-sasl-lib-2.1.27-6.el8_5.x86_64 expat-2.2.5-11.0.1.el8.x86_64 glibc-2.28-225.0.4.el8_8.6.x86_64 gssproxy-0.8.0-21.el8.x86_64 keyutils-libs-1.5.10-9.el8.x86_64 krb5-libs-1.18.2-25.0.1.el8_8.x86_64 libblkid-2.32.1-42.el8_8.x86_64 libcom_err-1.45.6-5.el8.x86_64 libgcc-8.5.0-18.0.6.el8.x86_64 libical-3.0.3-3.el8.x86_64 libicu-60.3-2.el8_1.x86_64 libmount-2.32.1-42.el8_8.x86_64 libnsl2-1.2.0-2.20180605git4a062cf.el8.x86_64 libpq-13.5-1.el8.x86_64 libselinux-2.9-8.el8.x86_64 libstdc+±8.5.0-18.0.6.el8.x86_64 libtirpc-1.1.4-8.el8.x86_64 libxcrypt-4.1.1-6.el8.x86_64 nss_nis-3.0-8.el8.x86_64 openldap-2.4.46-18.el8.x86_64 openssl-libs-1.1.1k-9.el8_7.x86_64 pcre2-10.32-3.el8_6.x86_64 python3-libs-3.6.8-51.0.1.el8_8.2.x86_64 systemd-libs-239-74.0.6.el8_8.5.x86_64 zlib-1.2.11-21.el8_7.x86_64
(gdb) up
#1 req_manager (preq=0x7276ab0) at req_manager.c:4483
4483 mgr_node_set(preq);
(gdb) up
#2 0x0000000000455844 in process_request (sfds=18) at process_request.c:720
720 dispatch_request(sfds, request);
(gdb) up
#3 0x00000000004c1eae in process_socket (sock=sock@entry=18) at net_server.c:510
510 svr_conn[idx]->cn_func(svr_conn[idx]->cn_sock);
(gdb) up
#4 0x00000000004c208a in wait_request (waittime=, priority_context=) at net_server.c:623
623 if (process_socket(em_fd) == -1) {
(gdb) up
#5 0x000000000042749e in main (argc=, argv=0x7fff20121cf8) at pbsd_main.c:1398
1398 if (wait_request(waittime, priority_context) != 0) {

Server log at time of crash. Curiously it crashed on the one node that currently contains vnodes:

04/22/2024 13:14:09;0004;Server@pbssrv1;Node;k4r0n6;attributes set: at request of root@pbssrv1
04/22/2024 13:14:09;0004;Server@pbssrv1;Node;k4r0n6;attributes set: state - offline
04/22/2024 13:14:09;0004;Server@pbssrv1;Node;k4r0n6;attributes set: state - down
04/22/2024 13:14:09;0004;Server@pbssrv1;Node;k4r0n6;attributes set: state - offline
04/22/2024 13:14:09;0004;Server@pbssrv1;Node;k4r0n6;attributes set: state - down
04/22/2024 13:14:09;0004;Server@pbssrv1;Node;k4r0n6[0];attributes set: state - offline
04/22/2024 13:14:09;0004;Server@pbssrv1;Node;k4r0n6[0];attributes set: state - down
04/22/2024 13:14:09;0004;Server@pbssrv1;Node;k4r0n6[0];attributes set: state - offline
04/22/2024 13:14:09;0004;Server@pbssrv1;Node;k4r0n6[0];attributes set: state - down
04/22/2024 13:14:09;0004;Server@pbssrv1;Node;k4r0n6[1];attributes set: state - offline
04/22/2024 13:14:09;0004;Server@pbssrv1;Node;k4r0n6[1];attributes set: state - down
04/22/2024 13:14:09;0004;Server@pbssrv1;Node;k4r0n6[1];attributes set: state - offline
04/22/2024 13:14:09;0004;Server@pbssrv1;Node;k4r0n6[1];attributes set: state - down
04/22/2024 13:14:09;0004;Server@pbssrv1;Node;k4r0n6;attributes set: at request of root@pbssrv1
04/22/2024 13:14:09;0004;Server@pbssrv1;Node;k4r0n6;attributes set: comment =
04/22/2024 13:14:09;0004;Server@pbssrv1;Node;k4r0n6;attributes set: comment =

Looking at the source where it crashed it looks like maybe this part of the code is specifically related to child vnode state, so perhaps not coincidental that it crashed on the node with child vnodes?

                    pnode = psvrmom->msr_children[momidx];
                    if ((strcmp(plist->al_name, ATTR_NODE_state) == 0) && (plist->al_op == INCR)) {
                            /* Marking nodes offline.  We should only mark the children vnodes
                             * as offline if no other mom that reports the vnodes are up.
                             */