Jobs killed by pbs_mom due to resource violation

Jobs that are killed by resource violation can be found

  1. by checking those suspected job id’s in the $PBS_HOME/mom_logs/YYYYMMDD
  2. by running $PBS_EXEC/unsupported/pbs_dtj -n 10

If you want to make sure , they are controlled within the limits, then you can use Cgroups.
https://pbspro.atlassian.net/wiki/spaces/PD/pages/11599882/PP-325+Support+Cgroups

If this is not what you mean by handle these jobs using hooks, please let us know what you like to do .

Thank you

1 Like