Functionality desired: I need a hook script to run on the scheduler node only. I need the to gather the list of compute hosts for a monitoring tool which will get subsequently launched from the hook script.
The issue: The event type ‘runjob’ the node list comes up as None. I did try some other event types but they ended up launching the hook script on the compute nodes as opposed to the scheduler. The script only needs to be run at the beginning of the job to gather job id and host lists. Then launch the monitoring tool.
Is this functionality possible w/ PBS hooks?
My hook:
Hook jobMonitor
type = site
enabled = true
event = runjob
user = pbsadmin
alarm = 30
order = 1
debug = true
fail_action = none
Hey Mike,
Thanks for the reply.
I think my terminology was a bit off. What I meant by node list was the list of hosts allocated for the job being launched. I need to create a host file associated with the job so that my monitoring tool can assign the pbs job IDs with telemetry coming from the subset of nodes allocated for the job.
So in summary the functionality would be:
Right before or at the time the job is launched the script runs on the head node to collect the allocated job nodes and creates a host file. The script will then make an API call to my monitoring tool and pass a job ID and host file.
I haven’t tested this, but according to Table 5-6 in the PBSHooks document, the runjob hook should have access to the exec_vnode attribute. You can massage that into a host list.
However, in your original post, you said the hook would also launch a monitoring tool. This is not recommended and will probably not work. Later, you said the hook will contact the tool via an API. This has a better chance of working.
For help in gettting an API working you might find the link below useful. It’s my own small app for monitoring a HPC at my Uni. I use the PBS include file to create a Python API that is more understandable and simple to use. It then called as a Python library.