Qsub : submit jobs with 17 procs on 5 nodes

Hi there,

new to openpbs (and this list) I have a problem starting jobs via qsub with i.e. 17 procs on 5 nodes.

What I have :

A running setup on headnode “i1” with somes nodes in :

in12
queue = parallel-19
in17
queue = parallel-19
in20
queue = parallel-19
in22
queue = parallel-19
in46
queue = parallel-19

The default queue is parallel-19 and works fine.

I am able to start jobs on these machines.

Now I’ m searching the correct way to qsub a job as described in the topic.

Testing fast with interactive Jobs I found running i.e. :

qsub -I -l select=5:ncpus=5 -l place=scatter

qsub: waiting for job 115.i1 to start
qsub: job 115.i1 ready

cat $PBS_NODEFILE ; echo $NCPUS

in12
in17
in20
in22
in46
5

logout
qsub: job 115.i1 completed

But with this idea starting with more nodes than available is impossible, of course.

What I want :
Start a job via qsub with 17 procs on the 5 servers available taking the less used servers. Topologie does not matter (in this case).

Maybe just a matter of correct syntax or capability of queue ?

Any help welcome.

If you don’t care where the tasks are placed, then you could use:

select=17:ncpus=1

By default, the PBS scheduler should fill up a node before scheduling jobs on the next node (smp_cluster_dist: pack in the sched_config).

Gabe

Hi Gabe,

thanks for your response. A bit of progress, but unfortunately not complete. Here my result :

qsub -I -l select=17:ncpus=1

qsub: waiting for job 133.i1 to start
qsub: job 133.i1 ready

udo@in12: /home/udo

cat $PBS_NODEFILE ; wc -l $PBS_NODEFILE ; echo $NCPUS

in12
in12
in12
in12
in12
in12
in12
in12
in12
in12
in12
in12
in12
in12
in12
in12
in12
17 /var/spool/pbs/aux/133.i1
1

logout

So, i have what I wanted regarding a “good” machinefile : one line for every proc.
But : Alle of the nodes are filled by the first node in12, not a “mixture” of nodes.

If we talk about file “sched_config” I see 2 of them :

/opt/pbs/etc/pbs_sched_config
and
/var/spool/pbs/sched_priv/sched_config

which is the one needed ? I suppose “/opt/pbs/etc/pbs_sched_config” :

rpm -qf /opt/pbs/etc/pbs_sched_config

openpbs-server-20.0.1-0.x86_64

rpm -qf /var/spool/pbs/sched_priv/sched_config

file /var/spool/pbs/sched_priv/sched_config is not owned by any package

(on Centos 8 … )

I changed smp_cluster_dist: to “round_robin” in both if the files, restarting pbs (and the server too), doesn’t change the behaviour.

grep smp_cluster_dist /opt/pbs/etc/pbs_sched_config /var/spool/pbs/sched_priv/sched_config | grep -v “#”

/opt/pbs/etc/pbs_sched_config:smp_cluster_dist: round_robin
/var/spool/pbs/sched_priv/sched_config:smp_cluster_dist: round_robin

Any further Idea ? Here my config : (ucat is cat without comments)

ucat /opt/pbs/etc/pbs_sched_config

round_robin: False all
by_queue: True prime
by_queue: True non_prime
strict_ordering: false ALL
help_starving_jobs: true ALL
max_starve: 24:00:00
backfill_prime: false ALL
prime_exempt_anytime_queues: false
primetime_prefix: p_
nonprimetime_prefix: np_
node_sort_key: “sort_priority HIGH” ALL
provision_policy: “aggressive_provision”
resources: “ncpus, mem, arch, host, vnode, aoe, eoe”
load_balancing: true ALL
smp_cluster_dist: round_robin
fair_share: false ALL
fairshare_usage_res: cput
fairshare_entity: euser
fairshare_decay_time: 24:00:00
fairshare_decay_factor: 0.5
preemptive_sched: true ALL
dedicated_prefix: ded

Any further idea ?

Please try any of the below

qsub -l select=17:ncpus=1 -l place=pack -I
Note: last -I after pack is the capital i for Ice Cream, you shoud have a node with atleast 17 cores here

qsub -l select=17:ncpus=1:host=nodename -I
Note: last -I after nodename is the capital i for Ice Cream, you shoud have a node with atleast 17 cores here

Hi adarsh,

I tried both of them. Here the results :

qsub -I -l select=17:ncpus=1 -l place=pack -I

qsub: waiting for job 135.i1 to start
qsub: job 135.i1 ready

cat $PBS_NODEFILE ; wc -l $PBS_NODEFILE ; echo $NCPUS

in12
in12
in12
in12
in12
in12
in12
in12
in12
in12
in12
in12
in12
in12
in12
in12
in12
17 /var/spool/pbs/aux/135.i1
1

again : 17 procs on in12, no other nodes.

qsub -l select=17:ncpus=1:host=nodename -I^C

won’t start … Aborted with CTL-C

logout

qsub: job 135.i1 completed

So - no luck.

But anway :
Thank’s
Udo

Sorry nodename is the hostname of the compute node (pbs_mom), you can find out by running the below command

pbsnodes -av | grep host

Ok, that works, I can select dedicated nodes.
But I don’t want to use only one node, but spread my computation over all (in this case 5) nodes and I want the pbs produce a well-adjusted hostfile with the nodes defined in the default queue, i.e.
in12
in17
in20
in22
in46
in12
in17
in20
in22
in46
in12
in17
in20
in22
in46
in12
in17

Udo

Take a look at Chapter 5, “Multiprocessor Jobs”, in the PBS Professional User’s Guide.

1 Like

Please try and adjust the below qsub line based on your requirements

qsub - l select=5:ncpus=10:mpiprocs=10 -l place=scatter

After again lot of time of searching/reading :

@ agurban

In chapter 5 of 2020.1 user’s Guide (which hopefully corresponden to my installed
“openpbs-server-20.0.1-0.x86_64.rpm”
I find a lot of examples (as overall in the net). All good if you have a lot of nodes and you need a symmetric set of processes.

But nowhere I see a solution to my problem : Starting odd number of procs with less nodes than my number of processes with load-balances machinefile.

Why do I need this ?

I have 4 machines dedicated to an mpi-based application which needs i.e. 17 procs.
I cannot specify 4 machine always because one may be down.
An 17 is a prime number, so no way to compute it “in a matrix”.

@adarsh
Same here : How realize “prime number” of procs ? Number of really usable nodes “dynamic” .

No simple solution ?

Ideal :
qsub -l ncpus=17 -l place=scatter -I

but :
qsub: “-lresource=” cannot be used with “select” or “place”, resource is: ncpus

You can specify the number of processes per host, and specify a different host for each chunk:
-l select=3:ncpus=3:mpiprocs=3+2:ncpus=4:mpiprocs=4 -lplace=scatter.

Note that ncpus is a host-level resource, so you use the select statement, not “-l ncpus”. PBS uses "-l " for job-wide resources such as walltime, not host-level resources. See the User’s Guide, chapter 4, especially 4.3 “Requesting Resources”. Also see especially UG 4.3.3, “Requesting Resources in Chunks”.

1 Like

You can sort vnodes according to the number of available CPUs:
node_sort_key: “ncpus HIGH unused” all
The default vnode sort is according to priority, which by default is not set. See the Administrator’s Guide, section 4.9.50, “Sorting Vnodes on a Key”.

Ah, finally this works exactly as I wanted :

qsub -I -l select=3:ncpus=3:mpiprocs=3+2:ncpus=4:mpiprocs=4 -lplace=scatter

qsub: waiting for job 369.i1 to start
qsub: job 369.i1 ready

udo@in20: /home/udo

wc -l $PBS_NODEFILE ; echo $NCPUS

17 /var/spool/pbs/aux/369.i1
3
udo@in20: /home/udo

cat $PBS_NODEFILE | sort | uniq -c

  3 in12
  4 in17
  3 in20
  3 in22
  4 in46

I suppose I have to dig deeper into the concepts of vnodes and chunks. The value of $NCPUS is … a bit irritating.

Anyway, you may agree, it seem’s to be an config-syntax-nightmare. Missing the old linux KISS-principle.

I will play around with these parameters. Hope to be able to find solutions for other unusual configs.

So thank you very much. The problem is solved in this case. Have a nice time.

Udo

So, did some work to procude dynamically the string agurban propsed. Just as a proof of concept.

    #!/bin/bash

    # name : generate_qsub_string 

    # proof of concept : produce hopefully well balanced hostfile used by pbs
    # Will produce string
    # -l select=<NODES_AVAILABLE_MINUS_1>...+1...


    # Input :
    # p1 : nodes availabe in queue
    # p2 : cores wanted

    # Output
    # string to add to qsub

    # lets go : ( no checks performed )

    # Nodes available
    NODES_AVAILABLE=$1

    # Nodes available minus 1
    NODES_AVAILABLE_MINUS_1=$(( $NODES_AVAILABLE - 1 ))

    # Numer of cores wanted
    CORES=$2

    if [ $NODES_AVAILABLE -ge $CORES ]
    then
            # just one proc per node
            STRING="-l select=$NODES_AVAILABLE:ncpus=1:mpiprocs=1 -lplace=scatter"
            echo $STRING "# =>" $NODES_AVAILABLE times 1 core
    else
            # do some computing ...
            AVERAGE_REAL=$( echo "$CORES / $NODES_AVAILABLE" | bc -l )
            AVERAGE_INT=$( echo "$CORES / $NODES_AVAILABLE" | bc )
            AVERAGE_REST=$( echo $AVERAGE_REAL - $AVERAGE_INT | bc -l )

            # ...  with a little bit round correction
            ROUND=$( echo "if ($AVERAGE_REST > 0.5) { print 1 } else { print 0 }" | bc -l )

            COUNT1=$(( $AVERAGE_INT + $ROUND ))
            COUNT2=$(( $CORES - ( $NODES_AVAILABLE_MINUS_1 * $COUNT1 ) ))

            STRING="-l select=$NODES_AVAILABLE_MINUS_1:ncpus=${COUNT1}:mpiprocs=${COUNT1}+1:ncpus=${COUNT2}:mpiprocs=${COUNT2} -lplace=scatter"
            echo $STRING "# =>" $NODES_AVAILABLE_MINUS_1 times ${COUNT1} cores, one time ${COUNT2} cores
    fi

Produces :

    # for available_hosts in {2..6} 18 ; do echo -n $available_hosts hosts " : " ; ./generate_qsub_string $available_hosts 17  ; done 2>&1 
    2 hosts  : -l select=1:ncpus=8:mpiprocs=8+1:ncpus=9:mpiprocs=9 -lplace=scatter # => 1 times 8 cores, one time 9 cores
    3 hosts  : -l select=2:ncpus=6:mpiprocs=6+1:ncpus=5:mpiprocs=5 -lplace=scatter # => 2 times 6 cores, one time 5 cores
    4 hosts  : -l select=3:ncpus=4:mpiprocs=4+1:ncpus=5:mpiprocs=5 -lplace=scatter # => 3 times 4 cores, one time 5 cores
    5 hosts  : -l select=4:ncpus=3:mpiprocs=3+1:ncpus=5:mpiprocs=5 -lplace=scatter # => 4 times 3 cores, one time 5 cores
    6 hosts  : -l select=5:ncpus=3:mpiprocs=3+1:ncpus=2:mpiprocs=2 -lplace=scatter # => 5 times 3 cores, one time 2 cores
    18 hosts  : -l select=18:ncpus=1:mpiprocs=1 -lplace=scatter # => 18 times 1 core

So, this script is still not very high sophisticated and can be made better, but at a first glance it will be ok for my needs.

Thanks again for all help

Udo

1 Like