Resources_used.cput = 00:00:00

Bert · May 28, 2024, 12:27pm

Hello,

I am setting up a HPC Azure with OpenPBS 20.0.1 and CycleCloud.
When I run a pbs python script to execute abaqus jobs, the cput remains at 0.
At the end of the task cput ~ 00:00:02 which is the stageout time during the Exiting phase.

The script runs the abaqus job (os.system(abq job=…)), it monitors the abaqus job, providing some progress in the task status, do the post-processing (subprocess.run(abaqus viewer noGUI=my_post_process.py)).

Since the script is executing abaqus with os.system and abaqus viewer for post processing with subprocess. One in non blocking, the other create a child process. It looks like PBS is monitoring only the python process which is in sleep mode while waiting for abaqus applications to end.

The same script in PBSPro on a Windows Node, makes the cput to increase normally and accordingly to the ncpus, so there must be a way to make OpenPBS do the same.

When I execute a very simple pbs python script for testing, cput is increasing normally:

#!/shared/apps/Python3.12/bin/python3.12
#PBS -W sandbox=private
#PBS -W stagein=‘.@10.88.46.132:/shared/home/saf112092/test/’
#PBS -W stageout='@10.88.46.132:/shared/home/saf112092/test/’
#PBS -A saf112092
#PBS -q workq
#PBS -l vm_size=Standard_D2ds_v5

import os
print(‘running abaqus’)
os.system(‘abq2022 job=my_job inter’)

The ‘inter’ suffix make the abaqus command in blocking mode.

Long introduction for a simple question.
How to make OpenPBS to count the cput accordingly to :

The applications executed by the PBS python script (os.system and subprocess)
Number of cpus used by applications (of course the number of cpus are requested in qsub and the same number of cpus are requested to abaqus)

Regards

adarsh · May 28, 2024, 1:50pm

Could you please try this script and check the cput:

#!/bin/bash
#PBS -W sandbox=private
#PBS -W stagein=‘.@10.88.46.132:/shared/home/saf112092/test/’
#PBS -W stageout=' @10.88.46.132:/shared/home/saf112092/test/’
#PBS -A saf112092
#PBS -q workq
#PBS -l vm_size=Standard_D2ds_v5
cd $PBS_O_WORKDIR
/absolute/path/to/abq2022 job=my_job inter

or

try this stress application to check whether cput time is updated

Bert · May 28, 2024, 1:57pm

I was able to activate the cput count by activating cgroups hook:

qmgr: set hook pbs_cgroups enabled = true

The VM Standard_D2ds_v5 in Azure is a one physical cpu hyperthreaded so 2 virtual cpus. I have to configure abaqus with a abaqus_v6.env file to force multiprocessing:
import os
import socket

mp_host_list=[[socket.gethostname(), 2],]
os.environ[‘ABA_CPUS_LOGICAL’]=‘1’
os.environ[‘ABA_BATCH_OVERRIDE’]=“1”

But now when I submit a job with #PBS -l select=1:ncpus=2:mem=2gb:vm_size=Standard_D2ds_v5, The MoM is rejecting but with #PBS -l select=1:ncpus=1:mem=2gb:vm_size=Standard_D2ds_v5,
The job runs and cput has correct value.

Now I need make Nodes to accept :
#PBS -l select=1:ncpus=2:mem=2gb:vm_size=Standard_D2ds_v5

Bert · May 28, 2024, 2:34pm

I read your post ‘MPI job shows running but with 00:00:00 time’ yesterday but unfortunately yum install stress -y does not work.

Your script will work as well as my test script both with and without pbs_cgroups enabled/disabled

Bert · May 28, 2024, 2:49pm

I finally was able to install Stress:

It seams like epel must be install first:

sudo yum install -y epel-release
sudo yum install -y stress

I have installed this both on pbs server and node, restarted pbs but running the “normal” python script which execute application in child processes, cput keeps being 00:00:00.

adarsh · May 28, 2024, 4:11pm

This is the experiment i did and it works for me:


[pbsdata@rhel9 ~]$ date ; qsub -l select=1:ncpus=1   -- /bin/stress --cpu 2 --timeout 30
Tue 28 May 17:08:59 BST 2024
3.rhel9

[pbsdata@rhel9 ~]$ date ; qstat -answ1
Tue 28 May 17:09:06 BST 2024

rhel9: 
                                                                                                   Req'd  Req'd   Elap
Job ID                         Username        Queue           Jobname         SessID   NDS  TSK   Memory Time  S Time
------------------------------ --------------- --------------- --------------- -------- ---- ----- ------ ----- - -----
3.rhel9                        pbsdata         workq           STDIN              45022    1     1    --    --  R 00:00:00 rhel9/0
   Job run at Tue May 28 at 17:08 on (rhel9:ncpus=1)

[pbsdata@rhel9 ~]$ qstat

[pbsdata@rhel9 ~]$ qstat -fx 3 | grep cput
    resources_used.cput = 00:01:00

[pbsdata@rhel9 ~]$ qstat -fx 3 | grep cpu
    resources_used.cpupercent = 109
    resources_used.cput = 00:01:00
    resources_used.ncpus = 1
    exec_vnode = (rhel9:ncpus=1)
    Resource_List.ncpus = 1
    Resource_List.select = 1:ncpus=1
    comment = Job run at Tue May 28 at 17:08 on (rhel9:ncpus=1) and finished
    Submit_arguments = -l select=1:ncpus=1 -- /bin/stress --cpu 2 --timeout 30
    argument_list = <jsdl-hpcpa:Argument>--cpu</jsdl-hpcpa:Argument><jsdl-hpcpa

Bert · July 1, 2025, 2:28pm

Hello Adarsh,

One year later I am back on configuring the HPC.

Thanks to your post I am able to reproduce your experiment with a pbs_script:
RG-238:

 -- <executable> [<arguments to executable>]

…
If a script or executable is specified, it must be the last argument to qsub. The arguments to an executable must follow the name of the executable.

qsub -- /usr/bin/stress -cpu 2 FeaJob.py
10.xxxx
qstat -xf -F json 10 | grep cpu
                "ncpus":1,
                "select":"1:ncpus=1:ungrouped=false",
            "schedselect":"1:ncpus=1:ungrouped=false",
            "Submit_arguments":"-N FeaJob.py -- /usr/bin/stress -cpu 2",
            "argument_list":"<jsdl-hpcpa:Argument>-cpu</jsdl-hpcpa:Argument><jsdl-hpcpa:Argument>2</jsdl-hpcpa:Argument>",

Bert · July 2, 2025, 3:39pm

Hello,

While qsub -- /usr/bin/stress -cpu 2 FeaJob.py submits a taks correctly, the execute node is not able to run the job. It seems that Stress think the pbs script is an option. Then how could I use the stress command? I also tried to add #PBS --/usr/bin/stress -cpu 2 in the pbs script but it does not work either.

Here is the sdterr:

stress: FAIL: [8950] (244) unrecognized option: FeaJob.py

Here is the mom log:

07/02/2025 15:34:43;0100;pbs_python;Hook;pbs_python;create_job: Creating directory /sys/fs/cgroup/systemd/pbs_jobs.service/jobid/24.ip-0A582E84/
07/02/2025 15:34:43;0100;pbs_python;Hook;pbs_python;create_job: Creating directory /sys/fs/cgroup/cpu,cpuacct/pbs_jobs.service/jobid/24.ip-0A582E84/
07/02/2025 15:34:43;0100;pbs_python;Hook;pbs_python;create_job: Creating directory /sys/fs/cgroup/cpuset/pbs_jobs.service/jobid/24.ip-0A582E84/
07/02/2025 15:34:43;0100;pbs_python;Hook;pbs_python;create_job: Creating directory /sys/fs/cgroup/memory/pbs_jobs.service/jobid/24.ip-0A582E84/
07/02/2025 15:34:43;0100;pbs_python;Hook;pbs_python;configure_job: mem not requested, assigning 268435456 to cgroup
07/02/2025 15:34:43;0100;pbs_python;Hook;pbs_python;Assigned resources: {'cpuset.cpus': [0], 'cpuset.mems': [0, 0], 'mem': 268435456}
07/02/2025 15:34:43;0008;pbs_python;Job;24.ip-0A582E84;update_job_usage: CPU percent: 0
07/02/2025 15:34:43;0008;pbs_python;Job;24.ip-0A582E84;update_job_usage: CPU usage: 0.000 secs
07/02/2025 15:34:43;0008;pbs_python;Job;24.ip-0A582E84;update_job_usage: Memory usage: mem=0b
07/02/2025 15:34:43;0100;pbs_python;Hook;pbs_python;Hook ended: pbs_cgroups, job ID 24.ip-0A582E84, event_type 64 (elapsed time: 0.0271)
07/02/2025 15:34:43;0100;pbs_python;Hook;pbs_python;main: Event type is execjob_launch, job ID is 24.ip-0A582E84
07/02/2025 15:34:43;0080;pbs_python;Hook;pbs_python;discover_gpus set to False because devices subsystem is disabled
07/02/2025 15:34:43;0100;pbs_python;Hook;pbs_python;Hook ended: pbs_cgroups, job ID 24.ip-0A582E84, event_type 2048 (elapsed time: 0.0263)
07/02/2025 15:34:43;0008;pbs_mom;Job;24.ip-0A582E84;Started, pid = 8950
07/02/2025 15:34:43;0080;pbs_mom;Job;24.ip-0A582E84;task 00000001 terminated
07/02/2025 15:34:43;0008;pbs_mom;Job;24.ip-0A582E84;Terminated
07/02/2025 15:34:43;0100;pbs_mom;Job;24.ip-0A582E84;task 00000001 cput=00:00:00
07/02/2025 15:34:43;0008;pbs_mom;Job;24.ip-0A582E84;kill_job
07/02/2025 15:34:43;0100;pbs_mom;Job;24.ip-0A582E84;ip-0A582E85 cput=00:00:00 mem=0kb
07/02/2025 15:34:43;0100;pbs_python;Hook;pbs_python;main: Event type is execjob_epilogue, job ID is 24.ip-0A582E84
07/02/2025 15:34:43;0080;pbs_python;Hook;pbs_python;discover_gpus set to False because devices subsystem is disabled
07/02/2025 15:34:43;0008;pbs_python;Job;24.ip-0A582E84;update_job_usage: CPU percent: 0
07/02/2025 15:34:43;0008;pbs_python;Job;24.ip-0A582E84;update_job_usage: CPU usage: 0.002 secs
07/02/2025 15:34:43;0008;pbs_python;Job;24.ip-0A582E84;update_job_usage: Memory usage: mem=308kb
07/02/2025 15:34:43;0100;pbs_python;Hook;pbs_python;_remove_cgroup: Removing directory /sys/fs/cgroup/systemd/pbs_jobs.service/jobid/24.ip-0A582E84
07/02/2025 15:34:44;0100;pbs_python;Hook;pbs_python;_remove_cgroup: Removing directory /sys/fs/cgroup/memory/pbs_jobs.service/jobid/24.ip-0A582E84
07/02/2025 15:34:44;0100;pbs_python;Hook;pbs_python;_remove_cgroup: Removing directory /sys/fs/cgroup/cpuset/pbs_jobs.service/jobid/24.ip-0A582E84
07/02/2025 15:34:44;0100;pbs_python;Hook;pbs_python;_remove_cgroup: Removing directory /sys/fs/cgroup/cpu,cpuacct/pbs_jobs.service/jobid/24.ip-0A582E84
07/02/2025 15:34:44;0100;pbs_python;Hook;pbs_python;Hook ended: pbs_cgroups, job ID 24.ip-0A582E84, event_type 256 (elapsed time: 0.4179)
07/02/2025 15:34:44;0008;pbs_mom;Job;24.ip-0A582E84;no active tasks
07/02/2025 15:34:44;0100;pbs_mom;Job;24.ip-0A582E84;Obit sent
07/02/2025 15:34:45;0100;pbs_mom;Req;;Type 54 request received from root@10.88.46.132:15001, sock=0
07/02/2025 15:34:45;0080;pbs_mom;Job;24.ip-0A582E84;copy file request received
07/02/2025 15:34:45;0100;pbs_mom;Job;24.ip-0A582E84;Staged 2/2 items out over 0:00:00
07/02/2025 15:34:45;0008;pbs_mom;Job;24.ip-0A582E84;no active tasks
07/02/2025 15:34:45;0100;pbs_mom;Req;;Type 6 request received from root@10.88.46.132:15001, sock=0
07/02/2025 15:34:45;0080;pbs_mom;Job;24.ip-0A582E84;delete job request received
07/02/2025 15:34:45;0100;pbs_python;Hook;pbs_python;main: Event type is execjob_end, job ID is 24.ip-0A582E84
07/02/2025 15:34:45;0080;pbs_python;Hook;pbs_python;discover_gpus set to False because devices subsystem is disabled
07/02/2025 15:34:45;0100;pbs_python;Hook;pbs_python;Hook ended: pbs_cgroups, job ID 24.ip-0A582E84, event_type 512 (elapsed time: 0.0082)
07/02/2025 15:34:45;0008;pbs_mom;Job;24.ip-0A582E84;no active tasks
07/02/2025 15:34:46;0008;pbs_mom;Job;24.ip-0A582E84;kill_job
07/02/2025 15:35:47;0080;pbs_python;Hook;pbs_python;discover_gpus set to False because devices subsystem is disabled

adarsh · July 2, 2025, 4:24pm

The stess command does not take python script as one of the input argument
Please refer: Linux stress command With Examples - GeeksforGeeks

Topic		Replies	Views
Job performance is lower when scheduled through pbs Users/Site Administrators	19	1870	March 18, 2022
Job not getting distributed among nodes Users/Site Administrators	41	3084	June 19, 2022
MPI job shows running but with 00:00:00 time Users/Site Administrators	10	847	July 1, 2025
Job cannot reach 100%cpu when submitting through openPBS Users/Site Administrators	7	1087	March 26, 2021
PBS-server not running Developers	31	7063	October 20, 2022

Resources_used.cput = 00:00:00

Related topics