I’m trying to write an execjob_prologue hook to trigger a logging service (Performance Co-Pilot) at the start of every job. I’m trying to use the subprocess module to invoke it, but it’s not working as I expect.
Here’s my hook:
import pbs
import sys
import datetime
import socket
import subprocess
e = pbs.event()
j = e.job
try:
jobdatelong = datetime.datetime.today().strftime('%Y%m%d.%H.%M.%S')
jobid = j.id
jobid = jobid.replace('-', '_')
fullhost = socket.gethostname()
today = datetime.date.today()
logdir = '/var/log/pcp/pmlogger/'+fullhost+'/'+str(today.year)+'/'+'{:02d}'.format(today.month)+'/'+str(today.day)
command = ['env', 'PMLOGGER_PROLOG=yes', 'pmlogger', '-U', 'pcp', '-c', '/etc/pcp/pmlogger/pmlogger-supremm.config', '-s', '4', '-l', '/tmp/job-'+jobid+'-begin-'+jobdatelong+'.log', logdir+'/job-'+jobid+'-begin-'+jobdatelong]
subprocess.Popen(command)
except SystemExit:
pass
except Exception as x:
e.reject(str(x))
From my testing, the “pmlogger” command should create log files as soon as the command is started, but when I test this hook it seems as though it never even starts, the files I expect are never created. I’ve tried outputting information returned by subprocess.popen I get a PID and a return code of “None”, which would oddly suggest that it didn’t terminate right away. However, if I write the PID to a file and quickly try to find it in top
, I can’t find any matching processes. The command should run for about 40 seconds, but it appears to be ending long before that.
I’ve tried running this command from a regular bash terminal and it works correctly there. Is there something different about the environment of running from a PBS hook or with python’s subprocess module that could possibly cause it to fail? Any suggestions on how to debug this would be greatly appreciated.