Can the server/scheduler force jobs to be multi-node depending on core count?

We have a small cluster of 13 machines. 10 of them are 8 core, 3 of them are 32 core.

I have issues with users submitting 8-10 core jobs and the 3 fast nodes getting hammered constantly, with the 10 smaller nodes only taking smaller jobs.

I know you can specify jobs to have multi-nodes by using

qsub -l select=x

but is there any way for me at the Scheduler to detect if a job has more than x number of cores, and if so, force it to be multi-node? Or if anyone has better suggestions on how to manage a cluster with varying resources, i’m open to any ideas.

Have you tried setting the priority of the skinny nodes to more than the fat nodes? This should encourage the 8 core jobs to prefer the skinny nodes.

Or, from the Admin Guide section 4.4.8 or thereabouts:

• If you want to place jobs on the vnodes with the fewest CPUs first, saving bigger vnodes for larger jobs:
• Sort vnodes so that those with fewer CPUs come first: node_sort_key: “ncpus LOW”

Do you know that the 8-10 core jobs are using MPI or GNU parallel? If not, trying to split them across nodes will likely cause them to fail.

Thanks. I guess my solution would be to create some sort of job routing that places jobs under a certain resource limit to the skinny nodes, and anything above to the fat nodes. I have it set to ncpus HIGH, which targets the fat nodes but as a result larger jobs get held up as they wait for smaller jobs to finish.

ncpus LOW would mean that an 8 core job (which in our environment is considered a large job) would take up an entire node.

  1. In a queuejob hook, you can get the requested number of cores, based on that re-construct the select statement to your needs or profiles and accept the job.

or

  1. In a run job hook, get the number cores and find out the destined compute node , if that does not match as per your requirement, reject the job , which will be re-queued.

Have you tried node priority as mentioned above? If the skinny nodes have higher priority, then jobs that fit will be placed there first, saving the fat nodes for jobs that need more cores.

If you split the nodes into separate sets, then you could end up with small jobs sitting in the queue even though there are free cores on fat nodes. (Which might be okay–depends on relative importance of small vs. wide jobs.)