Dear Community,
I am looking for a way to run OS Update Jobs in our environment. Before I state my question I will further explain what I am going to achieve:
On a regular basis, every 4 weeks, we bring our Linux machines (CentOs) to the latest software state. For this we basically run a yum update & reboot with some scripts around this to do some extra checks and settings. The work-path for this so far is.
-> Close the queue, wait for the nodes to be empty, execute the update on the node, reboot, open the queue.
This was ok for now, however the environment got bigger and the nodes more utilized, I would like to optimize and automatize that, to be a fire and forget thing.
There are several ways to takle this. One way would be to run an ansible kind of script. This has the drawback that ansible does not now about the node & job state and this would have to be scripted. So I was looking for sth. more elegant. I studied the PBS Big Book and came over the provisioning section.
So my idea is the following: define 2 resource states, e.g patched & unpatched. If there is a new OS update, set all the nodes to “unpatched” and run a provisioning script which Is asking for a node in state “patched”. In this approach, there is from my point of view the major advantage that the scheduler is taking care of running the update. So there this no need to close the queue or wait for nodes to be empty. Once the provisioning job is submitted, even if the nodes are busy, its just postponed until there are free nodes and the environment is updated. (The updates are tested beforehand on a subset of nodes, to avoid problems)
So now to my question: So far I can set the resources states and run a provision. However, I need a mechanism that allows me to run (in the simplest case) a yum update on the compute node as part of the script and this needs to be done as “root”. I was looking on the PBS Hooks but I am struggeling to understand how to implement such a hook.
Apart from my question, I am happy to hear how you run updates in your environment and maybe there are better solutions.
Thanks a lot …