Decreasing resources of a running job

Hello,

I wonder why it isn’t possible with qalter to decrease the resources of a running job.

I believe it would be very useful in certain scenarios. Say, my typical jobs are multithreading, working in parallel on similar data chunks. However, due to a statistical spread of the data, it takes different time for each thread to finish. So quite often, while a majority of the threads finish more-or-less synchronously, a few (out of tens) continue massaging the bits for quite some time. The result is a mostly idle node. The same goes for the allocated memory. So, ideally, the job itself should be able to signal PBS that some resources are no longer needed and can be used for other jobs. Or, at least, it could be done by the user or an outside monitoring tool.

Best,
Evgeny

The system will be too embarassingly dynamic in nature and scheduler(multi schedulers) would be busy in find out where it could run from cycle to cycle .

There are some tool(s) and configuration(s) made available like

  • pbs_release_nodes_on_stageout. which releases the sister nodes of multi-node(MPI) job.
  • runone feature that is analogous to OR fuctionality with respect to qsub requests.
  • Also, most of the application users do a prerun (profile their input decks) on their input deck to find out estimated resources that is required to r un their job, this would help more optimized request and usage of resources.

+1 . On the contrary, it would be useful if the user submits a job without asking for any resources and PBS would figure out , how such similar jobs of such kind were run, successful, failed in the past and then assign the resource request based on that (what it learnt from the past) and then scheduled that job

That sounds like your angling for adding something like AI into PBS. Hopefully that won’t occur :slight_smile:

Some external tools might do it : GenAI :slight_smile:

This is the common question from some of the application specialists, why we have to request resources, it ( :slight_smile: ) has to figure out and submit jobs.

Thanks for the link Adarsh. A brief read and it looks a very thorough bit of work. The conclusion says “This specific prediction task still remains challenging, as only partially successful results have been achieved on the collected data.” Hence it’s not likely to be seen soon. I still think the researcher is the one that is best placed to understand their job characteristics and make an appropriate split-up of their jobs with its resources. They just need to be motivated to do that with help from good post-job analytics.

1 Like

+1 @speleolinux – second that

However, I would like to point out, you COULD do a qalter (shrink only) on some resources during the running phase of a job, e.g. walltime