After conducting some investigations, I found the following logs:
07/13/2024 11:06:58;0001;Server@XXX01;Svr;Server@XXX01;Cannot allocate memory (12) in svr_mailowner_id, fork failed
07/13/2024 11:31:51;0001;Server@XXX01;Svr;Server@XXX01;Cannot allocate memory (12) in svr_mailowner_id, fork failed
07/13/2024 11:31:53;0001;Server@XXX01;Svr;Server@XXX01;Cannot allocate memory (12) in svr_mailowner_id, fork failed
The problem is indicated by the failure of the fork() call, as shown here: openpbs/src/server/svr_mail.c at 81187aeceee8247a1fe82ee7f6c89a3987c1ff42 · openpbs/openpbs · GitHub.
Here is the current status of the pbs_server
process:
cat /proc/815639/status
Name: pbs_server.bin
Umask: 0022
State: S (sleeping)
Tgid: 815639
Ngid: 0
Pid: 815639
PPid: 1
TracerPid: 0
Uid: 0 0 0 0
Gid: 0 0 0 0
FDSize: 64
Groups: 0
NStgid: 815639
NSpid: 815639
NSpgid: 815639
NSsid: 815639
VmPeak: 2653792 kB
VmSize: 2653792 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 2433696 kB
VmRSS: 2433692 kB
RssAnon: 2423584 kB
RssFile: 10108 kB
RssShmem: 0 kB
VmData: 2439804 kB
VmStk: 132 kB
VmExe: 1396 kB
VmLib: 47676 kB
VmPTE: 4996 kB
VmSwap: 0 kB
HugetlbPages: 0 kB
CoreDumping: 0
THP_enabled: 1
Threads: 2
SigQ: 4/22430
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000001001a00
SigCgt: 0000000180014003
CapInh: 0000000000000000
CapPrm: 000001ffffffffff
CapEff: 000001ffffffffff
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000
NoNewPrivs: 0
Seccomp: 0
Speculation_Store_Bypass: thread vulnerable
Cpus_allowed: f
Cpus_allowed_list: 0-3
Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 2306044
nonvoluntary_ctxt_switches: 908849
This is the memory usage on the server:
free -h
total used free shared buff/cache available
Mem: 5.5Gi 2.7Gi 460Mi 423Mi 2.3Gi 2.1Gi
Swap: 0B 0B 0B
It appears there isn’t enough memory available to ‘copy’ for the new process. Why is such a large amount of memory required? As far as I know, the Linux kernel should implement a Copy-On-Write (COW) strategy for forked processes.