Jobs fail: No Group Entry for Group -default-

Old PBS admin, new OpenPBS admin. Hello. I’ve converted a small, working PBS system (SGI ICE) to OpenPBS. Jobs won’t start. Different users in different default groups get same mom error: “No Group Entry for Group -default-”. Output from “id” for me on login node is the same as on compute node. My default group is known on the mom node.

Thanks, Chris

An strace of of the mom appears to print the Group error after stat’ing my $HOME. The output of my $HOME’s stat is the same on login node vs. mom node. Forcing a different group with -W group_list=staff didn’t change the outcome.

From a quick look at the code, it looks as if the server is setting the group to “-default-”. It does that if getpwnam() fails on the server.

Do you have flatuid set?

Could you submit a job with the -h (hold) flag and then qstat -f the job? The interesting fields are Job_Owner, euser, and egroup.

In fact strace of mom shows all calls to getuid and getgid are 0. So the error is about the group, I’m also an unknown user on the mom node as far as PBS is concerned. But I can su on the compute to myself and run “id” which works correctly.

flatuid is true. Jobs on hold show myself as the Owner but no lines containing “user” or “group” from qstat.

Our admin found the solution. passwd/group/shadow files have to be propagated to the PBS primary/secondary nodes. Once the user exists there, they can run jobs. This was not required even in PBS 2020.1.4.