PBS freezes after array job with 100k subjobs is finished

Hello,

We have OpenPBS 20.0.1 and submit to queue 5 - 10 array jobs where each job contains 100k subjobs (I increased max_array_size to 100000). Typically one array job is running on 500 cores where each subjob uses 1 core and the rest are waiting in queue. The performance is pretty good for such amount of jobs, but what I noticed is that each time array job is finished PBS freezes for ~ 10 minutes. That means that commands like qstat, pbsnodes are unresponsive and in server_logs I don’t see any messages during freeze. The overall system utilization is low, except postgres process that uses more CPU than usual (4% vs 0.5%) and is in D (uninterruptable sleep) state due to I/O.

iotop indeed shows that postgres is more actively writing to disk (4 Mb/s vs 800 Kb/s) and further analysis of postgres logs with pgBadger shows peak in UPDATE queries during freeze (from 40 to 100 queries/s). Typical update query is:

UPDATE pbs.job SET ji_state = '9', ji_substate = '92',... WHERE ji_jobid = '13242[92102]';

My guess is that after array job is finished PBS updates state of all 100k subjobs from X (6) to F (9) and that causes freeze.

Does it sound reasonable and if yes, what can I do to avoid the freeze?

PBS server log for the reference:

04/17/2022 18:12:21;0008;Server@ip-10-77-176-170;Job;12132.ip-10-77-176-170;Job Modified at request of Scheduler@ip-10-77-176-170
04/17/2022 18:12:21;0008;Server@ip-10-77-176-170;Job;12145[6487].ip-10-77-176-170;Job Run at request of Scheduler@ip-10-77-176-170 on exec_vnode (ip-10-77-176-168:ncpus=1)
04/17/2022 18:12:21;0040;Server@ip-10-77-176-170;Svr;ip-10-77-176-170;Scheduler sent command 2
04/17/2022 18:12:21;0040;Server@ip-10-77-176-170;Svr;ip-10-77-176-170;Scheduler sent command 0
04/17/2022 18:12:21 ;0010;Server@ip-10-77-176-170;Job;12144[99205].ip-10-77-176-170;Exit_status=271 resources_used.cpupercent=96 resources_used.cput=01:00:52 resources_used.mem=382612kb resources_used.ncpus=1 resources_used.vmem=1165856kb resources_used.walltime=01:03:08

15 minutes GAP here with no messages

04/17/2022 18:27:58 ;0001;Server@ip-10-77-176-170;Svr;Server@ip-10-77-176-170;Success (0) in connection_idlecheck, timeout connection from 10.77.176.170
04/17/2022 18:27:58;0001;Server@ip-10-77-176-170;Svr;Server@ip-10-77-176-170;Success (0) in connection_idlecheck, timeout connection from 10.77.176.170
04/17/2022 18:27:58;0001;Server@ip-10-77-176-170;Svr;Server@ip-10-77-176-170;Success (0) in connection_idlecheck, timeout connection from 10.77.176.170
04/17/2022 18:27:58;0001;Server@ip-10-77-176-170;Svr;Server@ip-10-77-176-170;Success (0) in connection_idlecheck, timeout connection from 10.77.176.170
04/17/2022 18:28:05;0040;Server@ip-10-77-176-170;Svr;ip-10-77-176-170;Scheduler sent command 3
04/17/2022 18:28:05;0040;Server@ip-10-77-176-170;Svr;ip-10-77-176-170;Scheduler sent command 0
04/17/2022 18:28:58;0040;Server@ip-10-77-176-170;Svr;ip-10-77-176-170;Scheduler sent command 3
04/17/2022 18:28:58;0040;Server@ip-10-77-176-170;Svr;ip-10-77-176-170;Scheduler sent command 0
04/17/2022 18:28:59;0010;Server@ip-10-77-176-170;Job;12145[6247].ip-10-77-176-170;Exit_status=0 resources_used.cpupercent=50 resources_used.cput=00:01:59 resources_used.mem=134080kb resources_used.ncpus=1 resources_used.vmem=895444kb resources_used.walltime=00:02:02

Please share the specification of the PBS Server host with respect to

  1. Processor
  2. RAM
  3. Disk speed on which PBS_HOME exists
  4. Network interconnect between the PBS Server and PBS Compute nodes
  5. Total number of jobs that existed on the system at that moment

is PBS Server running on bare metal/Physical host or on virtual machine ?

  • It is m5.2xlarge AWS instance with Intel(R) Xeon(R) Platinum 8259CL CPU, 8 cores, 32 Gb RAM.
  • PBS_HOME is on EFS partition with 200 Mb/s throughput mode, actual write speed test with dd shows ~ 180 Mb/s write speed.
  • Network interconnect is ~ 5 Gbit/s
  • There were 7 array jobs in queue, each with 100k subjobs, i.e. 700k jobs total.

@Nikita-T86 Thank you for sharing these details. The specification of the system seems good to me, except the EFS partition. I checked write speed test (dd command) on two systems it was 400 to 450 MB/s . However, i might be proved wrong.

Please collect gstacks of the pbs_server process in a loop every second for 15 seconds, when you start to see delay in updating the server logs. and share with the forum, the dev contributors might be able to ingest the gstack output share you the reasons.

Did you try this on bare metal cluster ? by any chance.

@adarsh, thank you for response, as I mentioned, iotop shows total disk write 4 Mb/s during freeze and EFS monitoring shows partition throughput well below threshold.

Gstack of pbs_server process is below, freeze happened at 12:12:30 and unfreeze at 12:37:14 (sorry for large log).

---Fri Jun  3 12:12:23 UTC 2022
Thread 2 (Thread 0x2aafa95e0700 (LWP 19797)):
#0  0x00002aaf9b136d47 in epoll_pwait () from /lib64/libc.so.6
#1  0x000000000049ef87 in work ()
#2  0x00002aaf99e0eea5 in start_thread () from /lib64/libpthread.so.0
#3  0x00002aaf9b136b0d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2aaf98e23040 (LWP 19796)):
#0  0x00000000004f8911 in encode_DIS_reply_inner ()
#1  0x0000000000457667 in dis_reply_write ()
#2  0x0000000000457aa5 in reply_send ()
#3  0x0000000000472acf in req_selectjobs ()
#4  0x000000000045661b in process_request ()
#5  0x00000000004bc316 in process_socket ()
#6  0x00000000004bc44e in wait_request ()
#7  0x000000000042b0df in main ()
---Fri Jun  3 12:12:25 UTC 2022
Thread 2 (Thread 0x2aafa95e0700 (LWP 19797)):
#0  0x00002aaf9b136d47 in epoll_pwait () from /lib64/libc.so.6
#1  0x000000000049ef87 in work ()
#2  0x00002aaf99e0eea5 in start_thread () from /lib64/libpthread.so.0
#3  0x00002aaf9b136b0d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2aaf98e23040 (LWP 19796)):
#0  0x00002aaf9b12bddd in poll () from /lib64/libc.so.6
#1  0x00002aaf9900820e in pqSocketCheck () from /lib64/libpq.so.5
#2  0x00002aaf99008290 in pqWaitTimed () from /lib64/libpq.so.5
#3  0x00002aaf99006551 in PQgetResult () from /lib64/libpq.so.5
#4  0x00002aaf9900684e in PQexecFinish () from /lib64/libpq.so.5
#5  0x00000000004e8985 in pbs_db_end_trx ()
#6  0x000000000044390c in job_save_db ()
#7  0x000000000048a9ed in svr_setjobstate ()
#8  0x000000000044fc46 in is_request ()
#9  0x0000000000454ef9 in do_tpp ()
#10 0x0000000000454f2f in tpp_request ()
#11 0x00000000004bc316 in process_socket ()
#12 0x00000000004bc4d2 in wait_request ()
#13 0x000000000042b0df in main ()
---Fri Jun  3 12:12:27 UTC 2022
Thread 2 (Thread 0x2aafa95e0700 (LWP 19797)):
#0  0x00002aaf9b136d47 in epoll_pwait () from /lib64/libc.so.6
#1  0x000000000049ef87 in work ()
#2  0x00002aaf99e0eea5 in start_thread () from /lib64/libpthread.so.0
#3  0x00002aaf9b136b0d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2aaf98e23040 (LWP 19796)):
#0  0x00002aaf9b136d47 in epoll_pwait () from /lib64/libc.so.6
#1  0x00000000004bc3a3 in wait_request ()
#2  0x000000000042b0df in main ()
---Fri Jun  3 12:12:28 UTC 2022
Thread 2 (Thread 0x2aafa95e0700 (LWP 19797)):
#0  0x00002aaf9b136d47 in epoll_pwait () from /lib64/libc.so.6
#1  0x000000000049ef87 in work ()
#2  0x00002aaf99e0eea5 in start_thread () from /lib64/libpthread.so.0
#3  0x00002aaf9b136b0d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2aaf98e23040 (LWP 19796)):
#0  0x00002aaf9b136d47 in epoll_pwait () from /lib64/libc.so.6
#1  0x00000000004bc3a3 in wait_request ()
#2  0x000000000042b0df in main ()
---Fri Jun  3 12:12:30 UTC 2022 <- **Freeze time**
Thread 2 (Thread 0x2aafa95e0700 (LWP 19797)):
#0  0x00002aaf9b136d47 in epoll_pwait () from /lib64/libc.so.6
#1  0x000000000049ef87 in work ()
#2  0x00002aaf99e0eea5 in start_thread () from /lib64/libpthread.so.0
#3  0x00002aaf9b136b0d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2aaf98e23040 (LWP 19796)):
#0  0x00002aaf9b12bddd in poll () from /lib64/libc.so.6
#1  0x00002aaf9900820e in pqSocketCheck () from /lib64/libpq.so.5
#2  0x00002aaf99008290 in pqWaitTimed () from /lib64/libpq.so.5
#3  0x00002aaf99006551 in PQgetResult () from /lib64/libpq.so.5
#4  0x00002aaf9900684e in PQexecFinish () from /lib64/libpq.so.5
#5  0x00000000004e907f in pg_db_cmd ()
#6  0x00000000004eb0ca in pg_db_save_job ()
#7  0x0000000000443869 in job_save_db ()
#8  0x000000000048aeee in svr_histjob_update ()
#9  0x000000000048b15e in svr_setjob_histinfo ()
#10 0x000000000048b4fa in svr_saveorpurge_finjobhist ()
#11 0x000000000043081e in chk_array_doneness ()
#12 0x000000000048af53 in svr_histjob_update ()
#13 0x000000000048b15e in svr_setjob_histinfo ()
#14 0x000000000048b4fa in svr_saveorpurge_finjobhist ()
#15 0x000000000045b2d5 in on_job_exit ()
#16 0x00000000004b8829 in dispatch_task ()
#17 0x000000000043f100 in process_DreplyTPP ()
#18 0x000000000044f8b7 in is_request ()
#19 0x0000000000454ef9 in do_tpp ()
#20 0x0000000000454f2f in tpp_request ()
#21 0x00000000004bc316 in process_socket ()
#22 0x00000000004bc4d2 in wait_request ()
#23 0x000000000042b0df in main ()
---Fri Jun  3 12:12:32 UTC 2022
Thread 2 (Thread 0x2aafa95e0700 (LWP 19797)):
#0  0x00002aaf9b136d47 in epoll_pwait () from /lib64/libc.so.6
#1  0x000000000049ef87 in work ()
#2  0x00002aaf99e0eea5 in start_thread () from /lib64/libpthread.so.0
#3  0x00002aaf9b136b0d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2aaf98e23040 (LWP 19796)):
#0  0x00002aaf9b12bddd in poll () from /lib64/libc.so.6
#1  0x00002aaf9900820e in pqSocketCheck () from /lib64/libpq.so.5
#2  0x00002aaf99008290 in pqWaitTimed () from /lib64/libpq.so.5
#3  0x00002aaf99006551 in PQgetResult () from /lib64/libpq.so.5
#4  0x00002aaf9900684e in PQexecFinish () from /lib64/libpq.so.5
#5  0x00000000004e8985 in pbs_db_end_trx ()
#6  0x000000000044390c in job_save_db ()
#7  0x000000000048aeee in svr_histjob_update ()
#8  0x000000000048b15e in svr_setjob_histinfo ()
#9  0x000000000048b4fa in svr_saveorpurge_finjobhist ()
#10 0x000000000043081e in chk_array_doneness ()
#11 0x000000000048af53 in svr_histjob_update ()
#12 0x000000000048b15e in svr_setjob_histinfo ()
#13 0x000000000048b4fa in svr_saveorpurge_finjobhist ()
#14 0x000000000045b2d5 in on_job_exit ()
#15 0x00000000004b8829 in dispatch_task ()
#16 0x000000000043f100 in process_DreplyTPP ()
#17 0x000000000044f8b7 in is_request ()
#18 0x0000000000454ef9 in do_tpp ()
#19 0x0000000000454f2f in tpp_request ()
#20 0x00000000004bc316 in process_socket ()
#21 0x00000000004bc4d2 in wait_request ()
#22 0x000000000042b0df in main ()
---Fri Jun  3 12:12:34 UTC 2022
Thread 2 (Thread 0x2aafa95e0700 (LWP 19797)):
#0  0x00002aaf9b136d47 in epoll_pwait () from /lib64/libc.so.6
#1  0x000000000049ef87 in work ()
#2  0x00002aaf99e0eea5 in start_thread () from /lib64/libpthread.so.0
#3  0x00002aaf9b136b0d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2aaf98e23040 (LWP 19796)):
#0  0x00002aaf9b12bddd in poll () from /lib64/libc.so.6
#1  0x00002aaf9900820e in pqSocketCheck () from /lib64/libpq.so.5
#2  0x00002aaf99008290 in pqWaitTimed () from /lib64/libpq.so.5
#3  0x00002aaf99006551 in PQgetResult () from /lib64/libpq.so.5
#4  0x00002aaf9900684e in PQexecFinish () from /lib64/libpq.so.5
#5  0x00000000004e8985 in pbs_db_end_trx ()
#6  0x000000000044390c in job_save_db ()
#7  0x000000000048aeee in svr_histjob_update ()
#8  0x000000000048b15e in svr_setjob_histinfo ()
#9  0x000000000048b4fa in svr_saveorpurge_finjobhist ()
#10 0x000000000043081e in chk_array_doneness ()
#11 0x000000000048af53 in svr_histjob_update ()
#12 0x000000000048b15e in svr_setjob_histinfo ()
#13 0x000000000048b4fa in svr_saveorpurge_finjobhist ()
#14 0x000000000045b2d5 in on_job_exit ()
#15 0x00000000004b8829 in dispatch_task ()
#16 0x000000000043f100 in process_DreplyTPP ()
#17 0x000000000044f8b7 in is_request ()
#18 0x0000000000454ef9 in do_tpp ()
#19 0x0000000000454f2f in tpp_request ()
#20 0x00000000004bc316 in process_socket ()
#21 0x00000000004bc4d2 in wait_request ()
#22 0x000000000042b0df in main ()
---Fri Jun  3 12:12:36 UTC 2022
Thread 2 (Thread 0x2aafa95e0700 (LWP 19797)):
#0  0x00002aaf9b136d47 in epoll_pwait () from /lib64/libc.so.6
#1  0x000000000049ef87 in work ()
#2  0x00002aaf99e0eea5 in start_thread () from /lib64/libpthread.so.0
#3  0x00002aaf9b136b0d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2aaf98e23040 (LWP 19796)):
#0  0x00002aaf9b12bddd in poll () from /lib64/libc.so.6
#1  0x00002aaf9900820e in pqSocketCheck () from /lib64/libpq.so.5
#2  0x00002aaf99008290 in pqWaitTimed () from /lib64/libpq.so.5
#3  0x00002aaf99006551 in PQgetResult () from /lib64/libpq.so.5
#4  0x00002aaf9900684e in PQexecFinish () from /lib64/libpq.so.5
#5  0x00000000004e8985 in pbs_db_end_trx ()
#6  0x000000000044390c in job_save_db ()
#7  0x000000000048aeee in svr_histjob_update ()
#8  0x000000000048b15e in svr_setjob_histinfo ()
#9  0x000000000048b4fa in svr_saveorpurge_finjobhist ()
#10 0x000000000043081e in chk_array_doneness ()
#11 0x000000000048af53 in svr_histjob_update ()
#12 0x000000000048b15e in svr_setjob_histinfo ()
#13 0x000000000048b4fa in svr_saveorpurge_finjobhist ()
#14 0x000000000045b2d5 in on_job_exit ()
#15 0x00000000004b8829 in dispatch_task ()
#16 0x000000000043f100 in process_DreplyTPP ()
#17 0x000000000044f8b7 in is_request ()
#18 0x0000000000454ef9 in do_tpp ()
#19 0x0000000000454f2f in tpp_request ()
#20 0x00000000004bc316 in process_socket ()
#21 0x00000000004bc4d2 in wait_request ()
#22 0x000000000042b0df in main ()

For next 25 minutes gstack is the same until unfreeze at 12:37:04. First message in pbs_server log appeared at 12:37:14.

Fri Jun  3 12:37:02 UTC 2022
Thread 2 (Thread 0x2aafa95e0700 (LWP 19797)):
#0  0x00002aaf9b136d47 in epoll_pwait () from /lib64/libc.so.6
#1  0x000000000049ef87 in work ()
#2  0x00002aaf99e0eea5 in start_thread () from /lib64/libpthread.so.0
#3  0x00002aaf9b136b0d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2aaf98e23040 (LWP 19796)):
#0  0x00002aaf9b12bddd in poll () from /lib64/libc.so.6
#1  0x00002aaf9900820e in pqSocketCheck () from /lib64/libpq.so.5
#2  0x00002aaf99008290 in pqWaitTimed () from /lib64/libpq.so.5
#3  0x00002aaf99006551 in PQgetResult () from /lib64/libpq.so.5
#4  0x00002aaf9900684e in PQexecFinish () from /lib64/libpq.so.5
#5  0x00000000004e8985 in pbs_db_end_trx ()
#6  0x000000000044390c in job_save_db ()
#7  0x000000000048aeee in svr_histjob_update ()
#8  0x000000000048b15e in svr_setjob_histinfo ()
#9  0x000000000048b4fa in svr_saveorpurge_finjobhist ()
#10 0x000000000043081e in chk_array_doneness ()
#11 0x000000000048af53 in svr_histjob_update ()
#12 0x000000000048b15e in svr_setjob_histinfo ()
#13 0x000000000048b4fa in svr_saveorpurge_finjobhist ()
#14 0x000000000045b2d5 in on_job_exit ()
#15 0x00000000004b8829 in dispatch_task ()
#16 0x000000000043f100 in process_DreplyTPP ()
#17 0x000000000044f8b7 in is_request ()
#18 0x0000000000454ef9 in do_tpp ()
#19 0x0000000000454f2f in tpp_request ()
#20 0x00000000004bc316 in process_socket ()
#21 0x00000000004bc4d2 in wait_request ()
#22 0x000000000042b0df in main ()
Fri Jun  3 12:37:04 UTC 2022
Thread 2 (Thread 0x2aafa95e0700 (LWP 19797)):
#0  0x00002aaf9b136d47 in epoll_pwait () from /lib64/libc.so.6
#1  0x000000000049ef87 in work ()
#2  0x00002aaf99e0eea5 in start_thread () from /lib64/libpthread.so.0
#3  0x00002aaf9b136b0d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2aaf98e23040 (LWP 19796)):
#0  0x00002aaf9b12bddd in poll () from /lib64/libc.so.6
#1  0x00002aaf9900820e in pqSocketCheck () from /lib64/libpq.so.5
#2  0x00002aaf99008290 in pqWaitTimed () from /lib64/libpq.so.5
#3  0x00002aaf99006551 in PQgetResult () from /lib64/libpq.so.5
#4  0x00002aaf9900684e in PQexecFinish () from /lib64/libpq.so.5
#5  0x00000000004e8985 in pbs_db_end_trx ()
#6  0x000000000044390c in job_save_db ()
#7  0x000000000048a9ed in svr_setjobstate ()
#8  0x000000000045ca82 in job_obit ()
#9  0x000000000044f792 in is_request ()
#10 0x0000000000454ef9 in do_tpp ()
#11 0x0000000000454f2f in tpp_request ()
#12 0x00000000004bc316 in process_socket ()
#13 0x00000000004bc4d2 in wait_request ()
#14 0x000000000042b0df in main ()
Fri Jun  3 12:37:06 UTC 2022
Thread 2 (Thread 0x2aafa95e0700 (LWP 19797)):
#0  0x00002aaf9b136d47 in epoll_pwait () from /lib64/libc.so.6
#1  0x000000000049ef87 in work ()
#2  0x00002aaf99e0eea5 in start_thread () from /lib64/libpthread.so.0
#3  0x00002aaf9b136b0d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2aaf98e23040 (LWP 19796)):
#0  0x00002aaf9b12bddd in poll () from /lib64/libc.so.6
#1  0x00002aaf9900820e in pqSocketCheck () from /lib64/libpq.so.5
#2  0x00002aaf99008290 in pqWaitTimed () from /lib64/libpq.so.5
#3  0x00002aaf99006551 in PQgetResult () from /lib64/libpq.so.5
#4  0x00002aaf9900684e in PQexecFinish () from /lib64/libpq.so.5
#5  0x00000000004e8985 in pbs_db_end_trx ()
#6  0x000000000044390c in job_save_db ()
#7  0x000000000044fcab in is_request ()
#8  0x0000000000454ef9 in do_tpp ()
#9  0x0000000000454f2f in tpp_request ()
#10 0x00000000004bc316 in process_socket ()
#11 0x00000000004bc4d2 in wait_request ()
#12 0x000000000042b0df in main ()
Fri Jun  3 12:37:07 UTC 2022
Thread 2 (Thread 0x2aafa95e0700 (LWP 19797)):
#0  0x00002aaf9b136d47 in epoll_pwait () from /lib64/libc.so.6
#1  0x000000000049ef87 in work ()
#2  0x00002aaf99e0eea5 in start_thread () from /lib64/libpthread.so.0
#3  0x00002aaf9b136b0d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2aaf98e23040 (LWP 19796)):
#0  0x00002aaf9b12bddd in poll () from /lib64/libc.so.6
#1  0x00002aaf9900820e in pqSocketCheck () from /lib64/libpq.so.5
#2  0x00002aaf99008290 in pqWaitTimed () from /lib64/libpq.so.5
#3  0x00002aaf99006551 in PQgetResult () from /lib64/libpq.so.5
#4  0x00002aaf9900684e in PQexecFinish () from /lib64/libpq.so.5
#5  0x00000000004e8985 in pbs_db_end_trx ()
#6  0x000000000044390c in job_save_db ()
#7  0x000000000044fcab in is_request ()
#8  0x0000000000454ef9 in do_tpp ()
#9  0x0000000000454f2f in tpp_request ()
#10 0x00000000004bc316 in process_socket ()
#11 0x00000000004bc4d2 in wait_request ()
#12 0x000000000042b0df in main ()
Fri Jun  3 12:37:09 UTC 2022
Thread 2 (Thread 0x2aafa95e0700 (LWP 19797)):
#0  0x00002aaf9b136d47 in epoll_pwait () from /lib64/libc.so.6
#1  0x000000000049ef87 in work ()
#2  0x00002aaf99e0eea5 in start_thread () from /lib64/libpthread.so.0
#3  0x00002aaf9b136b0d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2aaf98e23040 (LWP 19796)):
#0  0x00002aaf9b12bddd in poll () from /lib64/libc.so.6
#1  0x00002aaf9900820e in pqSocketCheck () from /lib64/libpq.so.5
#2  0x00002aaf99008290 in pqWaitTimed () from /lib64/libpq.so.5
#3  0x00002aaf99006551 in PQgetResult () from /lib64/libpq.so.5
#4  0x00002aaf9900684e in PQexecFinish () from /lib64/libpq.so.5
#5  0x00000000004e8985 in pbs_db_end_trx ()
#6  0x000000000044390c in job_save_db ()
#7  0x000000000044fcab in is_request ()
#8  0x0000000000454ef9 in do_tpp ()
#9  0x0000000000454f2f in tpp_request ()
#10 0x00000000004bc316 in process_socket ()
#11 0x00000000004bc4d2 in wait_request ()
#12 0x000000000042b0df in main ()
Fri Jun  3 12:37:11 UTC 2022
Thread 2 (Thread 0x2aafa95e0700 (LWP 19797)):
#0  0x00002aaf9b136d47 in epoll_pwait () from /lib64/libc.so.6
#1  0x000000000049ef87 in work ()
#2  0x00002aaf99e0eea5 in start_thread () from /lib64/libpthread.so.0
#3  0x00002aaf9b136b0d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2aaf98e23040 (LWP 19796)):
#0  0x00002aaf9b12bddd in poll () from /lib64/libc.so.6
#1  0x00002aaf9900820e in pqSocketCheck () from /lib64/libpq.so.5
#2  0x00002aaf99008290 in pqWaitTimed () from /lib64/libpq.so.5
#3  0x00002aaf99006551 in PQgetResult () from /lib64/libpq.so.5
#4  0x00002aaf9900684e in PQexecFinish () from /lib64/libpq.so.5
#5  0x00000000004e8985 in pbs_db_end_trx ()
#6  0x000000000044390c in job_save_db ()
#7  0x000000000044fcab in is_request ()
#8  0x0000000000454ef9 in do_tpp ()
#9  0x0000000000454f2f in tpp_request ()
#10 0x00000000004bc316 in process_socket ()
#11 0x00000000004bc4d2 in wait_request ()
#12 0x000000000042b0df in main ()
Fri Jun  3 12:37:12 UTC 2022
Thread 2 (Thread 0x2aafa95e0700 (LWP 19797)):
#0  0x00002aaf9b136d47 in epoll_pwait () from /lib64/libc.so.6
#1  0x000000000049ef87 in work ()
#2  0x00002aaf99e0eea5 in start_thread () from /lib64/libpthread.so.0
#3  0x00002aaf9b136b0d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2aaf98e23040 (LWP 19796)):
#0  0x00002aaf9b12bddd in poll () from /lib64/libc.so.6
#1  0x00002aaf9900820e in pqSocketCheck () from /lib64/libpq.so.5
#2  0x00002aaf99008290 in pqWaitTimed () from /lib64/libpq.so.5
#3  0x00002aaf99006551 in PQgetResult () from /lib64/libpq.so.5
#4  0x00002aaf9900684e in PQexecFinish () from /lib64/libpq.so.5
#5  0x00000000004e8985 in pbs_db_end_trx ()
#6  0x000000000044390c in job_save_db ()
#7  0x000000000044fcab in is_request ()
#8  0x0000000000454ef9 in do_tpp ()
#9  0x0000000000454f2f in tpp_request ()
#10 0x00000000004bc316 in process_socket ()
#11 0x00000000004bc4d2 in wait_request ()
#12 0x000000000042b0df in main ()
Fri Jun  3 12:37:14 UTC 2022
Thread 2 (Thread 0x2aafa95e0700 (LWP 19797)):
#0  0x00002aaf9b136d47 in epoll_pwait () from /lib64/libc.so.6
#1  0x000000000049ef87 in work ()
#2  0x00002aaf99e0eea5 in start_thread () from /lib64/libpthread.so.0
#3  0x00002aaf9b136b0d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2aaf98e23040 (LWP 19796)):
#0  0x00002aaf9b12bddd in poll () from /lib64/libc.so.6
#1  0x00002aaf9900820e in pqSocketCheck () from /lib64/libpq.so.5
#2  0x00002aaf99008290 in pqWaitTimed () from /lib64/libpq.so.5
#3  0x00002aaf99006551 in PQgetResult () from /lib64/libpq.so.5
#4  0x00002aaf9900684e in PQexecFinish () from /lib64/libpq.so.5
#5  0x00000000004e8985 in pbs_db_end_trx ()
#6  0x00000000004438fa in job_save_db ()
#7  0x000000000048a95a in svr_setjobstate ()
#8  0x000000000045b452 in on_job_exit ()
#9  0x00000000004b8829 in dispatch_task ()
#10 0x00000000004b8a75 in default_next_task ()
#11 0x000000000042aed0 in main ()
Fri Jun  3 12:37:16 UTC 2022
Thread 2 (Thread 0x2aafa95e0700 (LWP 19797)):
#0  0x00002aaf9b136d47 in epoll_pwait () from /lib64/libc.so.6
#1  0x000000000049ef87 in work ()
#2  0x00002aaf99e0eea5 in start_thread () from /lib64/libpthread.so.0
#3  0x00002aaf9b136b0d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2aaf98e23040 (LWP 19796)):
#0  0x00002aaf9b1292e7 in unlink () from /lib64/libc.so.6
#1  0x0000000000440409 in del_job_related_file ()
#2  0x00000000004405c2 in job_purge ()
#3  0x00000000004889fb in svr_clean_job_history ()
#4  0x00000000004b8829 in dispatch_task ()
#5  0x00000000004b8aad in default_next_task ()
#6  0x000000000042aed0 in main ()
Fri Jun  3 12:37:18 UTC 2022
Thread 2 (Thread 0x2aafa95e0700 (LWP 19797)):
#0  0x00002aaf9b136d47 in epoll_pwait () from /lib64/libc.so.6
#1  0x000000000049ef87 in work ()
#2  0x00002aaf99e0eea5 in start_thread () from /lib64/libpthread.so.0
#3  0x00002aaf9b136b0d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2aaf98e23040 (LWP 19796)):
#0  0x00002aaf9b12bddd in poll () from /lib64/libc.so.6
#1  0x00002aaf9900820e in pqSocketCheck () from /lib64/libpq.so.5
#2  0x00002aaf99008290 in pqWaitTimed () from /lib64/libpq.so.5
#3  0x00002aaf99006551 in PQgetResult () from /lib64/libpq.so.5
#4  0x00002aaf9900684e in PQexecFinish () from /lib64/libpq.so.5
#5  0x00000000004e8985 in pbs_db_end_trx ()
#6  0x00000000004eb359 in pg_db_delete_job ()
#7  0x0000000000440576 in job_purge ()
#8  0x00000000004889fb in svr_clean_job_history ()
#9  0x00000000004b8829 in dispatch_task ()
#10 0x00000000004b8aad in default_next_task ()
#11 0x000000000042aed0 in main ()
Fri Jun  3 12:37:19 UTC 2022
Thread 2 (Thread 0x2aafa95e0700 (LWP 19797)):
#0  0x00002aaf9b136d47 in epoll_pwait () from /lib64/libc.so.6
#1  0x000000000049ef87 in work ()
#2  0x00002aaf99e0eea5 in start_thread () from /lib64/libpthread.so.0
#3  0x00002aaf9b136b0d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2aaf98e23040 (LWP 19796)):
#0  0x00002aaf9b12bddd in poll () from /lib64/libc.so.6
#1  0x00002aaf9900820e in pqSocketCheck () from /lib64/libpq.so.5
#2  0x00002aaf99008290 in pqWaitTimed () from /lib64/libpq.so.5
#3  0x00002aaf99006551 in PQgetResult () from /lib64/libpq.so.5
#4  0x00002aaf9900684e in PQexecFinish () from /lib64/libpq.so.5
#5  0x00000000004e8985 in pbs_db_end_trx ()
#6  0x00000000004eb359 in pg_db_delete_job ()
#7  0x0000000000440576 in job_purge ()
#8  0x00000000004889fb in svr_clean_job_history ()
#9  0x00000000004b8829 in dispatch_task ()
#10 0x00000000004b8aad in default_next_task ()
#11 0x000000000042aed0 in main ()
Fri Jun  3 12:37:21 UTC 2022
Thread 2 (Thread 0x2aafa95e0700 (LWP 19797)):
#0  0x00002aaf9b136d47 in epoll_pwait () from /lib64/libc.so.6
#1  0x000000000049ef87 in work ()
#2  0x00002aaf99e0eea5 in start_thread () from /lib64/libpthread.so.0
#3  0x00002aaf9b136b0d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2aaf98e23040 (LWP 19796)):
#0  0x00002aaf9b12bddd in poll () from /lib64/libc.so.6
#1  0x00002aaf9900820e in pqSocketCheck () from /lib64/libpq.so.5
#2  0x00002aaf99008290 in pqWaitTimed () from /lib64/libpq.so.5
#3  0x00002aaf99006551 in PQgetResult () from /lib64/libpq.so.5
#4  0x00002aaf9900684e in PQexecFinish () from /lib64/libpq.so.5
#5  0x00000000004e8985 in pbs_db_end_trx ()
#6  0x000000000044390c in job_save_db ()
#7  0x000000000044fcab in is_request ()
#8  0x0000000000454ef9 in do_tpp ()
#9  0x0000000000454f2f in tpp_request ()
#10 0x00000000004bc316 in process_socket ()
#11 0x00000000004bc4d2 in wait_request ()
#12 0x000000000042b0df in main ()