Managing resource oversubscription problem by PBS Pro

Hi,

we do use a MAUI/Torque queue system on our rather small cluster. Since an open source version of PBS Pro became free available, right now we are seriously considering a migration from Torque/MAUI to PBS Pro. However, we do have few questions regarding runtime fine-tunings and we would highly appreciate if community can address the following questions about PBS Pro:

  1. In the framework of the MAUI scheduler, there is a nice feature to strictly control resource over-utilization by running job(s). Indeed, by using the RESOURCELIMITPOLICY parameter within maui.cfg one can simply control the policy/actions for a certain type resource utilization. In particular, our current settings are given below:

RESOURCELIMITPOLICY MEM:ALWAYS:CANCEL
RESOURCELIMITPOLICY SWAP:ALWAYS:CANCEL

The rules from above are very simple. Indeed, the running job is going to be canceled by the queue system if it utilizes more memory than originally requested. Now we are wondering whether one could implement the same rule withing the PBS Pro?

  1. We found LBNL Node Health Check (NHC) as a very useful tool to pretend users job from malfunctioned compute nodes. Is there any compatibility problem to integrate NHC with PBS Pro?

Thank you in advance!

With best regards,
Victor

Hi Victor,

Let me try to answer what you are looking for:

1 - Resource Limit Policy - PBS has resource enforcement for jobs. One can enforce jobs to limit itself from exceeding resources like ncpus, mem, vmem etc. For more details please refer to our Admin guide - “http://www.pbsworks.com/pdfs/PBSProAdminGuide13.1.pdf” and see section 5.15.3.3, 5.15.3.4 and 5.15.3.5.

2 - For Node Heath Check - In PBS we have not tried integrating LBNL NHC but we have one of our own implementation of Node Health Check periodic hook that can run on execution hosts and checks for various parameters like processes, file permissions, disk usage etc. If you visit our source on “https://github.com/PBSPro/pbspro” you will find this hook present as “src/unsupported/NodeHealthCheck.py”.
If you happen to try out integration of PBS Pro with LBNL NHC then we would love to hear about how it went :slight_smile:

Hope it helps!

Regards,
Arun Grover

1 Like

Hello Victor (@skoltech),

Arun (@arungrover) is correct. I would like to add that within the next few months we will be providing a hook that enables PBS Pro to utilize cgroups on Linux systems. When a job is launched, a cgroup will be created on each node and the job process assigned to the cgroup. The kernel then restricts access to the resources available to the job. This is a slightly different approach than cancelling the job, but one that may interest you.

Thanks,

Mike

1 Like

Dear Arun,

thanks for your reply! Indeed, the suggested by you documentation covers my questions. I think we will make an attempt and will give a try to migrate from MAUI/Torque to PBS.

Regarding the node health checking mechanism, first we will try to adjust and to use the native NodeHealthCheck.py script. Apparently, the IB section is somehow missed there. But it should not be a problem because the skeleton in python is clearly written.

Thank you for your effort and assistance!

With best regards,
Victor

Dear Mike,

the integration with cgroups sounds very promising. However, I am still wondering about NUMA support in PBS Pro. Does PBS Pro natively supports the NUMA topology out of the box? Or, alternatively, one should use a numactl and/or kind of special parameters to mpirun/mpiexec launching script to take advantage of NUMA?

Thank you in advance!

With best regards,
Victor

Hi Victor,

PBS Pro has supported cpusets on SGI systems for many years. This code may be enabled by passing configure the --enable-cpuset flag (see m4/enable_cpuset.m4). The goal is to phase this code out once the cpuset hook is released because it should provide the same functionality. The cgroups hook may be released sooner than I anticipated, possibly within a few weeks. It has been in use for several months at a limited number of locations.

Thanks,

Mike