Monitoring: retrieve PID from JOBID

I would like to monitor resource usage on nodes. To do that i need to retrieve the PID of the job main process.

i’ve found this method:

do you know if there is a clean way to obtain the pid?


1 Like

Please check and try this, the SessID is the PID of the job main process.

[pbsdata@openpbs ~]$ qsub – /bin/sleep 100

[pbsdata@openpbs ~]$ qstat -answ1
Req’d Req’d Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time

260.openpbs pbsdata workq STDIN 10330 1 1 500mb – R 00:00:00 openpbs/0
Job run at Sat Mar 13 at 12:43 on (openpbs:ncpus=1:mem=512000kb)

[pbsdata@openpbs ~]$ ps -ef | grep 10330
pbsdata 10330 89429 0 12:43 ? 00:00:00 /bin/sleep 100
pbsdata 10562 10220 0 12:43 pts/0 00:00:00 grep --color=auto 10330

Hope this helps

Thank you for the reply,

this works on jobs that are not spread on multiple nodes.

How can i retrieve the “main” pid for each execution chunk ?

You can use pbs_dtj to find the PID of processes from mother superior node and sister nodes in a multi-chunk multi-node job.
Note: If you are using MPI, then MPI should be compiled using PBS TM library for tight integration.

If the flavour of your MPI used has the feature to report the PID , then you can use something like this.
OpenMPI as example:

-report-pid, --report-pid

Print out mpirun’s PID during startup. The channel must be either a ’-’ to indicate that the pid is to be output to stdout, a ’+’ to indicate that the pid is to be output to stderr, or a filename to which the pid is to be written.
Reference Link: mpirun(1) man page (version 3.0.6)

Otherwise, if you want go with other opensource or commercial tools

  • Mistral / Breeze
  • Sysdig