Running large jobs efficiently on cluster with various node size

We have a cluster with nodes that has different #cpus mixed together.
With the current settings, it was found that the smaller (that requests fewer #cpus) jobs occupied the nodes.
So when a larger job was queued, the nodes do have free cpus, but not enough to run the larger job in one node, thus keep it waiting in the queue.
Is there a general suggestion to improve the waiting time and efficiency in such cases?
Like, set the nodes that has fewer #cpus with higher priority, leaving the larger nodes to run the larger jobs?
Or maybe to make the larger job has higher priority?



  1. All the jobs should request walltime to have good scheduling policy.

  2. the smaller jobs get backfilled (in front of) larger jobs and push them in the future, if you have walltime defined with strict ordering and backfill depth, then this can avoided.

  3. please check help_starving_jobs option of the sched_config?

  4. please enable strict_ordering with backfill_depth set to 4
    3.set eligible_time_enable to true and implement job_sort_formula