This is the design for specifying the scheduler user, or running the scheduler as a non-root user. https://pbspro.atlassian.net/wiki/spaces/PD/pages/1692598273/Specify+scheduler+user
In your design you mention…
If PBS_SCHEDULER_USER is changed after installation, the admin must change the ownership of the files/directories manually
Will there be a scheduler log message to inform the administrator what went wrong if the PBS_SCHEDULER_USER is changed between restarts?
What happens if PBS_SCHEDULER_USER is changed from foo to root, or if the setting is removed from the configuration file between restarts? The root user will be able to read/write files in sched_priv and sched_logs, but the ownership might become an issue. If PBS_SCHEDULER_USER is changed back to foo and the scheduler restarted, it may not be able to write to the log if it was created as root.
If --with-scheduler-user is supplied to configure and specified as non-root, does this become the default even when PBS_SCHEDULER_USER is unset in the config file? This could be confusing to the admin if they didn’t build PBS themselves. At the very least we need to note it in the log file. I’m questioning whether we really need --with-scheduler-user. There is precedent, since we do this for postgres, but that is an external component. It would be easier to document if --with-scheduler-user were avoided.
You’re right, I will remove this interface from configure, and manually put root as the default user.
I can put a log message when the scheduler starts that checks if sched_priv is owned by PBS_SCHEDULER_USER. I think the scheduler should die if it isn’t. Thoughts?
We could go a couple ways with this…
- We could continue starting the scheduler as root and perform a setuid/setgid to PBS_SCHEDULER_USER after some sanity checks on file and directory permissions. In this case, we can try and fix things before switching users and bail if it fails.
- The init script could start the scheduler with sudo/su as the PBS_SCHEDULER_USER. In this case, we have no choice to bail if the scheduler can’t read/write the files it needs.
Either way, we need to log some messages (if possible) for use by the admins and testing. Those messages will be an interface and should be listed in the design doc.
Let me ask a question related to requirements. If a user is logged in as PBS_SCHEDULER_USER, should they be able to start the scheduler directly? It would make a great deal of sense if this were the case for debugging. Otherwise, they would have to connect to the process after it was running. Not a big deal, but they would probably have to set scheduling to false in the server so they could set breakpoints before allowing the process to continue. Whatever the requirement is, it should probably be listed in the document.
I think PBS_SCHEDULER_USER should be able to start the scheduler directly. If I was concerned about security, it would feel more secure if it never runs as root at all. This means the scheduler will have to check the permissions and bail if they’re wrong.
I thought log messages are no longer required to be put in a design document, but I can add them.
I also added a way to set PBS_SCHEDULER_USER at install time.
Thanks @vstumpf, it looks good. I didn’t mean to imply that the exact log message needs to be in the design, just the fact that a log message (content TBD) will be issued for the benefit of administrators and for use with testing.
Making this a log message is a catch-22. You are logging that the permissions are wrong, but if the permissions are wrong to the log directory, you won’t be able to log it. I’d suggest you print it to stderr before you fork() and then exit.
If it’s a tty, it’ll also log errors to stderr.
I still think you shouldn’t bother to log. There is already a message in the scheduler that is printed to stderr that says you need to run the scheduler as root. I think you should just replace that message saying you need to run the scheduler as the sched user.
I have updated the design due to comments from a design discussion.
Thanks @vstumpf, one question/correction, though: The design now says that smp_cluster_dist will no longer work, but my understanding is that the impact of running scheduler as a non-root user is specific to the “lowest_load” value of smp_cluster_dist, not smp_cluster_dist as a whole (so “round_robin” and “pack” values are unaffected). Is that correct?
Sorry, you are correct, only smp_cluster_dist: lowest_load will no longer work. I’ll update the document.
The commercial packages ship pbs_sched with permissions 700. If an admin sets PBS_DAEMON_SERVICE_USER, the user needs to be able to execute pbs_sched.
- Automatic: I would like to change the permissions of pbs_sched to 755 in the rpm spec file for all installations.
2. Manual: If admin wants to set PBS_DAEMON_SERVICE_USER(PDSU), they must also set the permissions of pbs_sched to 755. pbs_probe will check if PDSU is set, and check/fix the permissions as necessary.
3. Manual: If admin wants to set PDSU, they must also change the owner of pbs_sched to PDSU. pbs_probe will check if PDSU is set, and check/fix the owner as necessary.
How about we rename PBS_DAEMON_SERVICE_USER to PBS_SERVICE_USER to simplify it? After all, DAEMON and SERVICE both are almost synonymous, so we do not need both the words.