Hello,
I modified “strict_ordering: True ALL” in sched_config. and restart pbs.
I tried the same job mix test with 'qmove’d job.
The result is the same as before.
The qmoved job was overtaken by later job.
“strict_ordering: True ALL”
Note, the jobs are the same job here .
#PBS -l select=1:ncpus=6:mem=8192MB:mpiprocs=6
#PBS -l walltime=02:00:00
(1) First, Before qmove Jid.33
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
31.share0A user009 workq JOBW1 22350 1 6 8192m 02:00 R 00:00
32.share0A user009 que2 JOBQ1 22370 1 6 8192m 02:00 R 00:00
33.share0A user009 workq JOBW2 -- 1 6 8192m 02:00 Q --
34.share0A user009 que2 JOBQ2 -- 1 6 8192m 02:00 Q --
35.share0A user009 workq JOBW3 -- 1 6 8192m 02:00 Q --
36.share0A user009 que2 JOBQ3 -- 1 6 8192m 02:00 Q --
There are same 6 jobs are submitted in queue workq and que2:
‘workq’ : JID 31, 33, 35
‘que2’ : JID 32, 34, 36
(2) After qmoved Jid.33 from workq -> que2
[user009@share0A ~]$ qmove que2 33.share0A ; qstat -a
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
33.share0A user009 que2 JOBW2 -- 1 6 8192m 02:00 Q --
31.share0A user009 workq JOBW1 22350 1 6 8192m 02:00 R 00:00
32.share0A user009 que2 JOBQ1 22370 1 6 8192m 02:00 R 00:00
34.share0A user009 que2 JOBQ2 -- 1 6 8192m 02:00 Q --
35.share0A user009 workq JOBW3 -- 1 6 8192m 02:00 Q --
36.share0A user009 que2 JOBQ3 -- 1 6 8192m 02:00 Q --
- JID.33 moved to que2 queue from workq.
- qstat -a lists JID.33 at the top of the list.
(3) Submit an another job(JID.37) into que2
[user009@share0A ~]$ qsub jobq4 ; qstat -a
37.share0A
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
33.share0A user009 que2 JOBW2 -- 1 6 8192m 02:00 Q --
31.share0A user009 workq JOBW1 22350 1 6 8192m 02:00 R 00:01
32.share0A user009 que2 JOBQ1 22370 1 6 8192m 02:00 R 00:01
34.share0A user009 que2 JOBQ2 -- 1 6 8192m 02:00 Q --
35.share0A user009 workq JOBW3 -- 1 6 8192m 02:00 Q --
36.share0A user009 que2 JOBQ3 -- 1 6 8192m 02:00 Q --
37.share0A user009 que2 JOBQ4 -- 1 6 8192m 02:00 Q -- << NEW JOB>>
Note: JID 37 is the same job to others.
(4) Job’s qtime
Apparently, JID.37 is later than JID.33.
[user009@share0A ~]$ qstat -f | egrep "^Job|qtime"
Job Id: 33.share0A
qtime = Tue Sep 10 15:21:50 2019
Job Id: 31.share0A
qtime = Tue Sep 10 15:20:45 2019
Job Id: 32.share0A
qtime = Tue Sep 10 15:20:45 2019
Job Id: 34.share0A
qtime = Tue Sep 10 15:20:55 2019
Job Id: 35.share0A
qtime = Tue Sep 10 15:21:05 2019
Job Id: 36.share0A
qtime = Tue Sep 10 15:21:05 2019
Job Id: 37.share0A
qtime = Tue Sep 10 15:22:44 2019
The start order of jobs should be ;
31,32,34,35,36, 33, 37.
(4) Sometime later, the last job(JID.37) started before JID.33
JID.37 overtakes JID.33 (!)
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
33.share0A user009 que2 JOBW2 -- 1 6 8192m 02:00 Q --
36.share0A user009 que2 JOBQ3 22890 1 6 8192m 02:00 R 00:00
37.share0A user009 que2 JOBQ4 22910 1 6 8192m 02:00 R 00:00
(5) Output files
JID.33 is the last job.
-rw------- 1 user009 1572 Sep 10 15:23 JOBW1.e31
-rw------- 1 user009 1691 Sep 10 15:23 JOBQ1.e32
-rw------- 1 user009 1694 Sep 10 15:26 JOBW3.e35
-rw------- 1 user009 1691 Sep 10 15:26 JOBQ2.e34
-rw------- 1 user009 1691 Sep 10 15:29 JOBQ4.e37
-rw------- 1 user009 1691 Sep 10 15:29 JOBQ3.e36
-rw------- 1 user009 1570 Sep 10 15:32 JOBW2.e33
sched_config :
[root@share0A ~]# egrep "strict_ordering|round_robin|job_sort_key|fair_share|backfill_depth|job_sort_formula" /var/spool/pbs/sched_priv/sched_config
# round_robin
round_robin: False all
# Run jobs by queues. If both round_robin and by_queue are not set,
# strict_ordering
# while adhering to site policy. Note that strict_ordering can result
strict_ordering: True ALL
# Use job_sort_formula specifying eligible_time to establish equivalent behavior.
# to the backfill_depth server parameter.
# job_sort_key
# job_sort_key allows jobs to be sorted by any resource. This
# Usage: job_sort_key: "resource_name HIGH | LOW"
# fair_share_perc, preempt_priority, job_priority
# job_sort_key: "ncpus HIGH"
# job_sort_key: "mem LOW"
#job_sort_key: "cput LOW" ALL
# node_sort_key is similar to job_sort_key but works for nodes.
# round_robin - run one job on each node in a cycle
# fair_share
fair_share: false ALL
# Uncomment this line (and turn on fair_share above)
# SEE: fair_share
[root@share0A ~]# pbsnodes -av
o610f001
Mom = o610f001
Port = 15002
pbs_version = 19.1.2
ntype = PBS
state = free
pcpus = 12
resources_available.arch = linux
resources_available.host = o610f001
resources_available.mem = 263542952kb
resources_available.ncpus = 12
resources_available.vnode = o610f001
resources_assigned.accelerator_memory = 0kb
resources_assigned.hbmem = 0kb
resources_assigned.mem = 0kb
resources_assigned.naccelerators = 0
resources_assigned.ncpus = 0
resources_assigned.vmem = 0kb
resv_enable = True
sharing = default_shared
last_state_change_time = Tue Sep 10 15:29:47 2019
last_used_time = Tue Sep 10 15:32:48 2019
[root@share0A ~]# qstat -Bf
Server: share0A
server_state = Active
server_host = share0a
scheduling = True
total_jobs = 0
state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0 Exiting:0 Begun
:0
default_queue = workq
log_events = 511
mail_from = adm
query_other_jobs = True
resources_default.ncpus = 1
default_chunk.ncpus = 1
resources_assigned.mem = 0mb
resources_assigned.mpiprocs = 0
resources_assigned.ncpus = 0
resources_assigned.nodect = 0
scheduler_iteration = 600
FLicenses = 20000000
resv_enable = True
node_fail_requeue = 310
max_array_size = 10000
pbs_license_min = 0
pbs_license_max = 2147483647
pbs_license_linger_time = 31536000
license_count = Avail_Global:10000000 Avail_Local:10000000 Used:0 High_Use:
0
pbs_version = 19.1.2
eligible_time_enable = False
max_concurrent_provision = 5
power_provisioning = False
max_job_sequence_id = 9999999