We have a user that racked 150k+ queued jobs in a routing queue. PBS commands became sluggish (see non usable). What is the ideal way to delete all of these jobs? I tried the usual (qdel + qselect in chunks) but I started seeing errors such:
-bash: /opt/pbs/default/bin/qdel: Argument list too long
max_queued was set on the execution queue but not the routing queue. It is mind boggling that a scheduler would let you rack/submit 150k+ jobs but not delete them. FWIW, pbs_server.bin is crawling. Is there a cycle/timeout we can decrease to speed things up?
I also see gaps of many minutes where qdel is not doing anything.
speed up the deletion of jobs? you can try turning scheduling off, that might help. Other than that, the qdel operation is, unfortunately, just slow. I submitted 100k jobs, turned scheduling off and did a qselect | xargs qdel -W force, it took:
time qselect | xargs qdel -W force
real 13m0.662s
user 2m40.102s
sys 6m43.559s
just to give you an idea of how long it might take.
“FWIW, pbs_server.bin is crawling”
the server was responsive for me, i was able to do qstats and qselects. Qselect of 100k jobs took around 4 seconds. So, I’m not sure why it’s crawling for you. It might again be because I had scheduling turned off.
Fixing qdel might be fairly simple. You could bundle the jobids into one call to the server, and that would remove most of the sluggishness with qdel. If you make such a change, do feel free submit a pull request.