Hello,
I’ve written a design document proposing changes which would enable resuming multi-node jobs. I believe this could be useful for times when you need to restart PBS processes in your infrastructure (for example when PBS update comes out) and don’t want to start job from scratch or limit your users to only single-node jobs in period leading up to the planned restart.
I’ve already programmed a (hopefully) working solution, but I’ll be glad for any feedback that would make this better!
Thans