Nvidia MIG Support

vstumpf · December 1, 2020, 1:04am

From the nvidia documentation, “the new Multi-Instance GPU (MIG) feature allows the NVIDIA A100 GPU to be securely partitioned into up to seven separate GPU Instances for CUDA applications”

We will support it through the cgroups hook.

Here’s the design: https://openpbs.atlassian.net/wiki/spaces/PD/pages/2313453569/Nvidia+MIG+Support

vstumpf · December 2, 2020, 1:38am

I added a bit more to the “how it works” section. @bayucan Since you have more experience with the cgroups hook, can you take a look?

bayucan · December 2, 2020, 6:29pm

@vstumpf It looks like it’s doable under cgroups hook. I see that you also looked into CUDA_VISIBLE_DEVICES, and likely\ need to add more to the “allow” devices list.
By the way, “MIG GPU” might be redundant in the doc as MIG is already defined as “MIG = Multi Instance GPU.”

vstumpf · December 3, 2020, 6:13pm

Thanks Al! You’re right, MIG GPU is as redundant as PIN number or ATM machine.

I’ll fix that.

Abhishek262 · July 15, 2021, 6:25am

Hey, I’ve updated the format as to how the hook specifies MIG device UUIDs in the CUDA_VISIBLE_DEVICES env variable since the old ‘tuple’ format was going out of support. Now I’m getting the MIG UUIDs via the nvidia-smi -L command.
Here is the link to the PR

Topic		Replies	Views
MIG - open PBS support Users/Site Administrators	5	1056	December 12, 2021
GPU Access Limited by CGroup Users/Site Administrators	14	8406	June 13, 2018
Use case for setting CUDA_VISIBLE_DEVICES in cgroups hook Developers	0	739	July 27, 2020
Trying to get CUDA_VISIBLE DEVICES set with hook Users/Site Administrators	8	4141	September 24, 2018
Is OpenPBS able to manage GPUs when MIG is enabled? Users/Site Administrators	1	66	January 16, 2025

Nvidia MIG Support

Related topics