A coworker and I have been able to get openpbs 20.0.1 built within a docker container based on Ubuntu 14.04 where software development happens (we’ve posted a couple other threads with different issues). This will allow our users to send a job from that environment to our head node (Ubuntu 18.04) hosting the scheduler server. Here is one of the earlier threads. PBS 20.0.1 Within 32-bit Docker container - #4 by Chase
Now that openpbs is built in a docker container with Ubuntu 14.04, there seems to be a pbs system variable that isn’t being recognized. openPBS won’t start and is giving this error.
This appears to be a compute node. You don’t need a pbs_comm on a compute node, one pbs_comm running on the server/scheduler headnode is enough for a small-medium cluster. You could set PBS_START_COMM=0.
As for the pbs_mom, is it running after you start up? If not, can you run the pbs_mom daemon directly (/opt/pbs/sbin/pbs_mom)?
This instance of openPBS will be communicating with our head node that hosts the schedule server. To be clear, this openPBS instance is contained in a docker container (on users laptops) while the head node resides on a separate server. I was wondering exactly what daemons would be necessary for that. From my reading, I thought I would need both the comm and mom daemons running.
With that in mind; Do you think I still don’t need the pbs_comm?
For users to “send a job” to the head node you would rather expect either a server to be there (which could indeed send jobs over) or just the client commands.
A MoM would be needed to run PBSPro jobs in the container itself (and is quite the can of worms if NAT gateways are involved) that others from outside would submit to a server that knows the MoM, which doesn’t sound what you are trying to achieve.
Client commands just need an /etc/pbs.conf but really will only use PBS_SERVER and PBS_EXEC. There is no need for daemons to use e.g. “qsub” to submit jobs to the head node.
If you are in a container and there is a NAT gateway (i.e. the container does not use host networking and so doesn’t have an extra IP address on the host’s ethernet interface) you’ll have to use munge authentication, though. resvport won’t work.
Thanks. I was beginning to realize that I was not correctly understanding what components would be necessary for this type of setup. I was initially thinking the pbs_comm and pbs_mom daemons would be needed to establish networking. After reading up more on multihomed setups I’m uncertain exactly what is needed. I read that interactive jobs need MoMs to establish a connection (interactive jobs are something my users will likely make use of).
Regarding munge authentication. I’m not familiar. Feel free to point me to some documentation. I have an LDAP server running on our cluster that is authenticating users.
Nowhere do you need a daemon to run client commands. Client commands simple open a TCP connection to PBS_SERVER. But if you don’t use munge, they will also run pbs_iff which will vouch for a particular port/address combination, which won’t work through a NAT gateway. So that forces you to run the container using host networking, and it will only work if the remote machine doesn’t have a NAT gateway between itself and the server.
If you want to be able to work just with a single TCP connection and in-band authentication (instead of pbs_iff) then you need Munge. It’s a daemon with a private key that allows a host to verify that someone pretending to be user X at least has authenticated as user X (through other mechanisms) on a host in the munge domain with the same key.
For that you run munged in the entire domain with the same key installed. That’s it. On the PBS side you add munge to PBS_SUPPORTED_AUTH_METHODS (if it’s not there then it’s silently just “resvport”) and add PBS_AUTH_METHOD=munge to all the machines where you want munge to be used instead of resvport (usually if you install munge you use it for everything within the munge domain, i.e. even for daemon-to-daemon authentication.) I’d suggest searching Google for PBS_SUPPORTED_AUTH_METHODS to find guides for its usage (often in PBS Professional documentation). For OpenPBS, see https://openpbs.atlassian.net/wiki/spaces/PD/pages/1510604832/PR-1505+Introduce+LibAuth+and+refactoring+in+DIS+TPP+support+routines
But using a munge domain with domain members scattered on laptops roaming the planet is usually a security nightmare if that’s your only line of defence – it’s hard to ensure that your munge key will not be compromised one day.
Frankly: if you have users with laptops roaming the world there are so many security implications and configuration issues that you’d be mad not to consider a VPN solution to make the client connection appear on the same network as the server, with a very strong authentication mechanism for setting up the VPN channel.
Especially since that’s the only way that you’re going to be making qsub -I work, since for that qsub will be acting as a server and the execution host needs to be able to connect to it as if they were on the same IP network (without any NAT translation).
Whether you use a VPN on the host and use host networking in the container or set up a VPN connection from within the container is something that I can’t decide for you.