the need was to stop running (“R”) jobs in an emergency maintenance. I could set server scheduling to False, but “R” jobs will keep running (expected). scenario:
- dedicated time queue between 10 and 16z and jobs are “Q” during.
- at 16z an X number of jobs kicked in and started running.
- at 16:15z “set server scheduling = False” is issued.
Had to kill running and exiting jobs. There was 400+ jobs between the two and PBS was sluggish.
How can this be addressed in case of an emergency ? Would it be possible to hold/queue all of the X jobs ?
I’m a little confused about what is going on in your scenario.
Dedicated time means two things. First is that jobs in a dedicated time queue will only run during dedicated time and if they won’t spill over after the dedicated time ends. The other thing is that jobs not in dedicated time queues won’t start if they’ll spill over into dedicated time. The only exception to this is if a conflicting dedicated time is added after a job starts running. This may be the issue in your case since it was an emergency dedicated time.
If you turn scheduling to false, scheduling cycles stop running. The only way to run jobs will be through qrun.
As for your question about holding jobs, you can always do that. You can do a qhold -h s
qselect -s Q. This will hold all queued jobs with a system hold. If you are using dedicated time, this shouldn’t be necessary. The dedicated time functionality should handle this all for you.