FreeIPA secondary groups for resource limits

Hi all,

first of all, as this is my first post here and I already found so many solutions here, I want to thank all of you for your efforts in writing and maintaining this great tool. And in case I just didn’t find the right searchterm, I’m sorry and I hope you can point me to the right direction.

Now to the topic:
I have a AlmaLinux 8 based cluster with freeipa as identity management. My users all have their own private group (as primary group) and some secondary groups, depending on the department and the software they have access to (like vasp or gaussian…).
I want to limit resource usage groupwise (per department)(e.g. groupA has access to a total of 500 CPUs) and per user (e.g. no user, no matter which group they belong to, should be able to use more than 128 CPUs). As far as I understood, limits like: “s q queue max_run_res.ncpus += [u:@groupA=16]” (as per user of groupA) is not possible, but only limiting by group and PBS_GENERIC.
My problem now is, that pbs seems to ignore usergroups as long as they are not primary groups.
Here’s my configuration on the workq:

    queue_type = Execution
    total_jobs = 74
    state_count = Transit:0 Queued:2 Held:0 Waiting:0 Running:72 Exiting:0 Begun:0
    max_queued = [u:zmeijuan=20]
    acl_host_enable = True
    acl_hosts = submission_node.domain.tld
    resources_max.mem = 128247mb
    resources_max.ncpus = 128
    resources_default.mem = 512mb
    resources_default.walltime = 24:00:00
    acl_group_enable = True
    acl_groups = groupB,groupC,groupD,groupA
    resources_assigned.mem = 945054592kb
    resources_assigned.mpiprocs = 100
    resources_assigned.ncpus = 948
    resources_assigned.nodect = 72
    max_run = [g:groupB=40]
    max_run = [g:groupC=40]
    max_run = [g:groupD=10]
    max_run = [g:groupA=80]
    max_run_res.mem = [g:groupB=512gb]
    max_run_res.ncpus = [g:groupB=512]
    max_run_res.mem = [g:groupC=512gb]
    max_run_res.ncpus = [g:groupC=512]
    max_run_res.mem = [g:groupD=64gb]
    max_run_res.ncpus = [g:groupD=64]
    max_run_res.mem = [g:groupA=2048gb]
    max_run_res.ncpus = [g:groupA=2048]
    max_run_res.mem = [u:PBS_GENERIC=160gb]
    max_run_res.ncpus = [u:PBS_GENERIC=160]
    enabled = True
    started = True

groupA e.g. is a secondary group of some user. Primary group is allways a private group (uid==gid). All groups in ‘‘acl_groups’’ are secondary groups and are respected when it comes to access and authorization. But regarding the limits, especially ‘‘max_run_res.’’ only the “PBS_GENERIC” user is respected, unless I set the respective group as the primary group of the user. But I would not want to remove the PBS_GENERIC user, as I need a per user limit. Ofcourse it would be possible to script user creation to add every new user with its own limits, but I would think that there’s a better solution. Especially as it is working as expected when I change the primary group of the users. But then every user in the same group could access data in the other members homedirs.

I hope I was more or less clear, as english is not my first language.

Kind regards and many thanks in advance

Nico

P.S.: We are running OpenPBS 20.0.1 compiled from sources on a AlmaLinux nodes and server. After upgrading to CentOS8.4 (from 8.1) the binaries could not be installed due to the unresolvable dependency problem with libhwloc. But the problem was there before upgrading, so I guess it has nothing to do with it.

This is my understanding as a PBS user/admin, and not as Altair staff, so possibly there is a better answer:

PBS doesn’t exactly ignore secondary groups, but the job is only executed under one group (see group_list in the qsub man page). If you want the job to be run under a secondary group, that needs to be specified. That can be done at job submission (e.g. -Wgroup_list=groupB), or you could probably write a hook that sets the group on the job.

Gabe

1 Like

Hi Gabe

thank you for that. I had the hope, that I do not have to write a hook :wink: But if that’s the solution I will try that. Though I am still wondering, why the secondary groups are evaluated when acl_groups are involved but then “ignored” when it comes to resource limits.

But for now I will try your solution and come back, when I have had success… or not.

Regarding the qsub option I will have a closer look at the man page.

Thank you for your help

Nico

Hi Gabe,

thanks again. The workaround with -W group_list=… seems to work (though I still think openpbs should honor all the groups it uses for granting access to a queue via acl_groups). Thanks for the hint. I’ll keep an eye on it and hope my users don’t see that I aliased qsub to use the group_list option.

Thanks

Nico

Hi,

just to let you know: It worked. Thanks. But I compiled the script with shc, to obfuscate.
If somebody is interested, I can share the wrapper script.

I don’t know how to do, but this thread can be marked as solved.

Kind regards

Nico