I would like to monitor resource usage on nodes. To do that i need to retrieve the PID of the job main process.
i’ve found this method:
do you know if there is a clean way to obtain the pid?
Thanks
I would like to monitor resource usage on nodes. To do that i need to retrieve the PID of the job main process.
i’ve found this method:
do you know if there is a clean way to obtain the pid?
Thanks
Please check and try this, the SessID is the PID of the job main process.
[pbsdata@openpbs ~]$ qsub – /bin/sleep 100
260.openpbs
[pbsdata@openpbs ~]$ qstat -answ1
openpbs:
Req’d Req’d Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
260.openpbs pbsdata workq STDIN 10330 1 1 500mb – R 00:00:00 openpbs/0
Job run at Sat Mar 13 at 12:43 on (openpbs:ncpus=1:mem=512000kb)
[pbsdata@openpbs ~]$ ps -ef | grep 10330
pbsdata 10330 89429 0 12:43 ? 00:00:00 /bin/sleep 100
pbsdata 10562 10220 0 12:43 pts/0 00:00:00 grep --color=auto 10330
Hope this helps
Thank you for the reply,
this works on jobs that are not spread on multiple nodes.
How can i retrieve the “main” pid for each execution chunk ?
You can use pbs_dtj to find the PID of processes from mother superior node and sister nodes in a multi-chunk multi-node job.
Note: If you are using MPI, then MPI should be compiled using PBS TM library for tight integration.
If the flavour of your MPI used has the feature to report the PID , then you can use something like this.
OpenMPI as example:
-report-pid, --report-pid
Print out mpirun’s PID during startup. The channel must be either a ’-’ to indicate that the pid is to be output to stdout, a ’+’ to indicate that the pid is to be output to stderr, or a filename to which the pid is to be written.
Reference Link: mpirun(1) man page (version 3.0.6)
Otherwise, if you want go with other opensource or commercial tools