Get Queue Waiting Time

I need to get the Queue waiting time of a job, for this which attribute I have to enable and how to extract from the report. Pls anyone suggest.

The queue time is collected by default.

qstat -fx | grep -i time
e.g.,
qstat -fx 582 | grep -i time

I got the following result for the command u have given
[root@rnarasimha ~]# qstat -fx 9029 | grep time
resources_used.walltime = 06:02:36
ctime = Sat Jan 22 17:39:43 2022
mtime = Sat Jan 22 23:44:36 2022
qtime = Sat Jan 22 17:39:43 2022
Resource_List.walltime = 24:00:00
stime = Sat Jan 22 17:41:56 2022
etime = Sat Jan 22 17:41:33 2022
history_timestamp = 1642875276

here in this result, what is mtime, ctime, qtime, stime, etime.

mtime - time that the job was last modified, changed state or changed locations.
Please go throught he reference guide : https://www.altair.com/pdfs/pbsworks/PBSReferenceGuide2021.1.pdf

Another option is to set ā€˜eligible_time_enableā€™ on the server. This is a ā€œsmartā€ wait time. It only counts time when you are not getting in your own way (like hitting one of your own limits). You will see eligible time set on jobs after that.

Bhroam

1 Like

It would be overkill if all you want is the queue waiting time, but I have a python version of qstat that allows you to display only the values you want and to customize how they are displayed. For example,

./nas_qstat -x -W o=jobid,queue,jobname,s,lifetime,eligtime,eligstart
                               Life  Elig Eligible
JobID         Queue Jobname S  time  time start
------------- ----- ------- - ----- ----- --------------
13819.server2 workq STDIN   F 00:51 00:44 22-02-07/10:10
13820.server2 workq job2    F 00:27 00:26 22-02-07/10:34

Here we display the job lifetime (how long from when it was queued until it exited), the eligible time as mentioned by Bhroam, and when it was first eligible to be scheduled.

Now, if you want to know how long the job sat in the queue, that would be the time from when it was queued (qtime) until the time it started (stime). You can grab those values using nas_qstat and feed them to awk to compute the difference:

 ./nas_qstat -x -W noheader -W o=jobid,A_qtime,A_stime | awk '{print $1, $3 - $2 }'
13819.server2 2981
13820.server2 1538

If youā€™re going to want this waiting time often, you can define a new field using a $HOME/.qstat_userexits file like so:

if args.B:
    pass
elif args.Q:
    pass
else:
    def my_add_fields(gbl, lcl):
        def fmt_qwait(fi, info):
            from nas_field_format import safeint, secstoclock
            qtime = safeint(info.get('qtime', '--'))
            stime = safeint(info.get('stime', '--'))
            if qtime == 0 or stime == 0:
                return '--'
            return secstoclock(stime - qtime)
        setattr(nas_field_format, 'fmt_qwait', fmt_qwait)
        fmtr = lcl.get('fmtr')
        t = gen_field('qwait', 'QWait', {'rj':'r'}, 'fmt_qwait', 'qtime stime')
        fmtr.known_fields.append(t)

    userexit_add_fields = stack_userexit(userexit_add_fields, my_add_fields)

Then, display that value via:

 ./nas_qstat -x -W o=jobid,queue,jobname,s,qwait
JobID         Queue Jobname S QWait
------------- ----- ------- - -----
13819.server2 workq STDIN   F 00:50
13820.server2 workq job2    F 00:26

(The fmt_qwait function should be more complicated to take into account jobs that are still queued or were deleted before they started.)

For more information, see the github site: https://github.com/drtoss/pyqs

1 Like

Hi Bhroam,
Iā€™ve enabled ā€˜eligible_time_enableā€™ and soon after enabling the jobs submitted are running but they are not generating any output or error files, before enabling ā€˜eligible_time_enableā€™ it was running fine. Request to help me to rectify this.

This error is not for one job, happening for all jobs which are submitted.

The MoM log in /var/spool/pbs/mom_logs/ on the execution host should give you more information about why you arenā€™t getting output.

When the job finishes you should see messages like:

02/07/2022 12:01:01;0008;pbs_mom;Job;13820.server2;no active tasks
02/07/2022 12:01:01;0100;pbs_mom;Job;13820.server2;Obit sent
02/07/2022 12:01:02;0080;pbs_mom;Job;13820.server2;copy file request received
02/07/2022 12:01:02;0100;pbs_mom;Job;13820.server2;Staged 1/1 items out over 0:00:00
02/07/2022 12:01:02;0008;pbs_mom;Job;13820.server2;no active tasks
02/07/2022 12:01:02;0080;pbs_mom;Job;13820.server2;delete job request received
02/07/2022 12:01:02;0008;pbs_mom;Job;13820.server2;no active tasks

If something is wrong, you should see explanatory messages after the ā€œcopy file request receivedā€ line.

When you turn on eligible_time_enable, it just turns on the counting of eligible_time. Itā€™s just another number held by the job. By default, it does nothing other than be reported in qstat -f. If you want it to be used in scheduling, consider looking at the job_sort_formula. You can add eligible_time as a factor to your formula and have jobs that have waited longer have higher priority than jobs more recently submitted.