I’ve written a design on converting the scheduler’s current log_filter into a more PBS compliant log_events.
There are two reasons behind doing this change. The first is to have a unified way of determining which events are logged for all daemons. The second is to help speed up the scheduler (and other daemons) by avoiding unnecessary string manipulation. A new liblog logging function will be introduced which is a combination of log_event() and sprintf(). When someone calls this new function, the function will first determine if we need to log that event before formatting and logging the string.
@bhroam – not sure how to word it , if these logs/log events can be easily ingested via ELK / Knowledge works / AI kind of tools. This will make some in roads to plot what services/daemons are thinking in live.
Hi Bhroam, I think it is great, thanks. I would only add the supporting fact that the pbs_comm daemon also follows the server and mom method via PBS_COMM_LOG_EVENTS.
@bhroam Document looks good to me. I do have a suggestion to add decimal (if possible, in hex too) bitmask value of each logging level. This will be needed for us to document it properly.
@adrash This RFE does not change any of the log messages themselves. It changes how the admin sets their logging policy. Instead of saying, ‘I don’t want to see DEBUG2 messages’, they say, ‘I want to see DEBUG messages’. @arungrover The values of the logging levels are well documented. I don’t think they need to be specified in this document. @scc I’ll mention PBS_COMM_LOG_EVENTS. Thanks.
Hey,
I made a slight change in how log_eventf() works. Before, I’d figure out how long the buffer needed to be and then made a choice to use a local buffer or allocate one. This required two calls to vsprintf() (although one didn’t actually write anything into a buffer). The vast majority of log messages we print in PBS are smaller than the buffer (4096). I changed it to just directly vsnprintf() into the local buffer. If it truncates, then allocate and make a second call to vsnprintf(). This means on the majority of calls, it’ll be faster. On the really long log messages, it will be slower since it will write into a buffer twice.