PP-829: Preemption via deletion

bhroam · June 24, 2017, 1:49am

I’ve added a design document for PP-829. The scheduler will now have the ability to preempt jobs by deleting them. The preempt_order sched_config option will be extended with the ‘D’ letter to signify deletion.

arungrover · June 26, 2017, 5:13pm

@bhroam I do not have any comment on the design proposal. I have a small question though - Will this mean that when scheduler issues delete command to PBS it will be using “-W force” option? I think without that option there might be some delay in deleting the job and scheduler will have to wait.

bhroam · June 26, 2017, 10:52pm

Thanks for your comment @arungrover
You bring up a very interesting point. If we do a normal qdel, the job might take some time to be deleted. The scheduler might be trying to start the high priority job before the preempted jobs are deleted. Using qdel -Wforce sounds like a good solution to this issue, but I think it’s just hiding the same issue under the rug. Before the qdel -Wforce returns to the scheduler, the job will be purged from the server’s database. The server shows the resources as immediately being free, but the mom is still doing end of job processing.

I see three issues
First is cleanup hooks. If we start a new job before the cleanup hooks are finished, the new job might be cleaned up.
Second is begin hooks. If we have a begin hook that makes sure the status of the node is prep’d for the job, it might clean up the old job. This might not be too bad. It would be worse if the cleanup and the begin hooks clash
Last is the Cray/cpuseted machines, or machines running cgroups. We previously told the operating system to carve out part of the machine for a job. Part of end of job processing is to release those resources back to the machine. If the scheduler runs the high priority job before the resources are returned, the new OS request will be rejected (e.g., ALPS reservation). This by itself is bad, but it gets worse. We’ve just deleted jobs jobs to run our high priority job. The runjob of the high priority job fails. The newly freed resources will likely be filled by new jobs. On subsequent cycles, the whole process will start again.

The only wait I can see us not falling into any of these traps is to wait for the deletes to end before running the high priority job. This unfortunately slows the scheduler down.

Bhroam

scc · June 27, 2017, 12:57pm

Hi Bhroam, thanks for putting this together!

I had always thought that qsub -c n only had an impact on periodic checkpointing, but I see that it does indeed prevent checkpoint_abort as well. Extending that line of thinking from qsub -r n and qsub -c n, what are your thoughts on introducing a new interface (possibly as part of a separate project) to mark jobs as being non-suspend-able since once this work is finished all of the other methods can be allowed to fail acceptably since the admin can always have “preempt by delete” as a backup?

bhroam · June 28, 2017, 11:47pm

Hey @scc
Extending PBS to not allow jobs to be suspended is doable. I view it as a separate (but related) RFE. Another separate but related RFE is to parallelize preemption. If PBS is going to have to wait for jobs to be deleted, it will be best to submit all the delete requests at once and then poll the whole lot. It would require some server and IFL work, but the same idea could be applied to checkpointing and requeuing jobs. Of course waiting for a preemption method to complete like this will cause problems with our error detection. Do we wait for job to requeue before we delete it? When do we give up and move onto our next preemption method?

Any or all of these could be clubbed together and implemented at the same time.

Bhroam

Topic		Replies	Views
Preemption optimization - phase 1 Developers	60	3164	December 10, 2018
Ability to delete jobs via preemption Developers	10	1015	May 7, 2019
Pre-sched hook event in server Developers	5	731	December 10, 2018
Killing jobs using qdel -Wforce does not fire EXECJOB_PRETERM/EXECJOB_EPILOGUE events Developers	1	630	July 17, 2023
Deleting 150k+ queued jobs Users/Site Administrators	9	1458	September 16, 2020

PP-829: Preemption via deletion

Related topics