Nvidia MIG Support

From the nvidia documentation, “the new Multi-Instance GPU (MIG) feature allows the NVIDIA A100 GPU to be securely partitioned into up to seven separate GPU Instances for CUDA applications”

We will support it through the cgroups hook.

Here’s the design: https://openpbs.atlassian.net/wiki/spaces/PD/pages/2313453569/Nvidia+MIG+Support

I added a bit more to the “how it works” section. @bayucan Since you have more experience with the cgroups hook, can you take a look?

@vstumpf It looks like it’s doable under cgroups hook. I see that you also looked into CUDA_VISIBLE_DEVICES, and likely\ need to add more to the “allow” devices list.
By the way, “MIG GPU” might be redundant in the doc as MIG is already defined as “MIG = Multi Instance GPU.”

Thanks Al! You’re right, MIG GPU is as redundant as PIN number or ATM machine. :slight_smile:

I’ll fix that.