Qsub select packing questions

I am fairly new to PBS and I have a series of questions about how I can control process distribution in PBS. For all of these questions, lets assume I have a 16 node cluster, each node has 32 cores.

For the first set of questions lets assume I want to do qsub -l select-17:ncpus=4 -l place=<WHAT GOES HERE> or if the answer isn’t place= how do I get what I want.

Scenario 1: I want the minimum possible set of nodes, which in this case would end up being 2 complete nodes and a 3rd node with one process using 4 cores. I tried place=pack, but AFAICT that only works for a single node, so I would call this packN or min_nodes.

Feb 19: Updated Scenario 2 because I realized my description ended up with 33 processes, not 17 as it should have.

Scenario2: I want the maximum possible set of nodes, which would be 1 processes of 4 cores on each of the 16 nodes with one of the 16 having a 2nd 4 core process. I tried place=scatter, but the job never runs because it says there are not enough nodes. I would call this scatterN or maxnodes.

Once I have the above, I want to start adding in mpiprocs and I want to control the sequence in the pbsnodes file.

AFAICT it will always pack them, so in Scenario 1 above, assuming I got nodes 1-3 I would get n1 on the first 8 lines, n2 on the next 8, and then one line with n3. That is often what we want, but I would also like to be able to get round robin: n1,n2,n3,n1,n2,n1…n2.

In Scenario 2: I would get n1,n1,n1,n2,n2,n3.n3…n16,n16. Again, often that is fine, but I would like to also be able to get n1,n2…n16,n1,n2…n16,n1

Maybe I am missing something obvious or maybe this just can’t be done, but I would appreciate any ideas or input you all may have.

Thanks,

Bill

You cannot do what you want easily. When you want special placement, you need to specify more stuff.

Scenario 1 could be accomplished with

-l place=pack -l select=2:ncpus=32:mpiprocs=8:ompthreads=4+1:ncpus=4:mpiprocs=1

Scenario 2 could be accomplished with

-l place=scatter -l select=15:ncpus=8:mpiprocs=2:ompthreads=4+1:ncpus=12:mpiprocs=3

(Untested. Might need to tweak somewhat.)

As to changing the MPI rank assignment, the usual trick is to read in $PBS_NODEFILE; write it out, re-ordered, to a local file; and then point PBS_NODEFILE to that new file before launching your MPI process.

Thanks for the ideas. I will give them a try. What you said about MPI rank assignment confirmed what I suspected. My one sentence summary of your suggestion is that the scheduler can’t do this on its own, the user has to have some knowledge and do some “scheduling logic” of their own. For instance, having to deal with the oddball chunk as a separate chunk specification. That isn’t bad, just parroting back what I think I am understanding from your answer.

We are having an interesting internal discussion about this. There is a contingent that argue that the scheduler should just “provision resources” and that all the nodes file needs is the list of hosts each occurring one time and then they will deal with all the process distribution stuff themselves.

Thanks for taking the time to respond.

Bill