How to clean up orphaned MPI process



I have a user application which sometimes get deadlock and don’t respond to sigterm. Only sigkill (kill -9) can stop it from running. However, after “qdel” PBS simply thinks the job has quit and assign new jobs onto the node, effectively oversubscribed the node.

Is there a reliable way to kill the job? Or do I need to attach an execjob_end hook? I see there is attribute but don’t know what it contains.


Please add the below line to the $PBS_HOME/mom_priv/config and restart the pbs_mom services.
$restrict_user True

This would remove all the orphaned MPI processes.

The mom now actively remove user’s process if none of his job was assigned the the node.

Seems promising. Maybe it will solve most part of the problem.

