PBS stopped writing server and account logs

A month ago, our PBS 22.05.11 stopped writing the server log and the accounting log. The last message in the server log was log rotation:

01/29/2025 00:00:25;0002;Server@myserver;Svr;Log;Log closed

But the log for 20250129 was never created, nor any other server or accounting logs. The sched and comm logs are still being written.

The filesystem was never anywhere near full, and I don’t see anything in syslog to give me a hint as to what happened.

My question is: if I restart PBS now, will it know what jobs are still running? It looks to me like the PBS server does a log replay on startup, so is it going to think it is Jan 29th just after midnight, not knowing anything that happened since then?

Thanks in advance

OK, well, I was able to connect to the postgres database and I see the job info in there, so I guess OpenPBS is walking the database on startup, not replaying accounting or server logs.

Can someone confirm that missing accounting and server logs are irrelevant on pbs restart?

The PBS Server does not cleanup server logs or accounting logs of the system.
If the PBS Server is up and running , then the server logs are created irrespective of whether the queuing system has jobs or no jobs . The accounting logs written by the job , if there are no jobs on the systems then, no accounting logs will be created.
Please refer: https://help.altair.com/2024.1.0/PBS%20Professional/PBS2024.1.pdf
Section: 12.1.2 Managing the Accounting Log File
Section: 9.4.2 Event Logfiles

Yes, I know that, I have my own cron jobs to clean up very old logs.

There are 202 jobs running right now, and there have been 42144 jobs submitted since Jan 29. Not one of those job got recorded in the server log or accounting log, but I can find the jobs in the postgres database.

some pointers:

  • Whether the log cleanup script by mistake deleted the active log and from that point onwards the pbs_server has stopped logging into server logs and accounting.
  • log levels of the daemons are correct
  • CAUTION: Did you get a chance to quickly stop and start the PBS Server using
    qterm -t quick ; source /etc/pbs.conf; $PBS_EXEC/sbin/pbs_server

I don’t see how the active log could have been deleted, I am using the find command and with the mtime option:

find /var/spool/pbs/server_logs -mtime +90 -type f -print -exec rm -f {} ;
find /var/spool/pbs/server_priv/accounting -mtime +366 -type f -print -exec rm -f {} ;

In any case a restart fixed the problem, new server and accounting logs have been created and are being written to.

Thanks.

1 Like