Resources_used.cput = 00:00:00

Hello,

I am setting up a HPC Azure with OpenPBS 20.0.1 and CycleCloud.
When I run a pbs python script to execute abaqus jobs, the cput remains at 0.
At the end of the task cput ~ 00:00:02 which is the stageout time during the Exiting phase.

The script runs the abaqus job (os.system(abq job=…)), it monitors the abaqus job, providing some progress in the task status, do the post-processing (subprocess.run(abaqus viewer noGUI=my_post_process.py)).

Since the script is executing abaqus with os.system and abaqus viewer for post processing with subprocess. One in non blocking, the other create a child process. It looks like PBS is monitoring only the python process which is in sleep mode while waiting for abaqus applications to end.

The same script in PBSPro on a Windows Node, makes the cput to increase normally and accordingly to the ncpus, so there must be a way to make OpenPBS do the same.

When I execute a very simple pbs python script for testing, cput is increasing normally:

#!/shared/apps/Python3.12/bin/python3.12
#PBS -W sandbox=private
#PBS -W stagein=‘.@10.88.46.132:/shared/home/saf112092/test/
#PBS -W stageout='
@10.88.46.132:/shared/home/saf112092/test/’
#PBS -A saf112092
#PBS -q workq
#PBS -l vm_size=Standard_D2ds_v5

import os
print(‘running abaqus’)
os.system(‘abq2022 job=my_job inter’)

The ‘inter’ suffix make the abaqus command in blocking mode.

Long introduction for a simple question.
How to make OpenPBS to count the cput accordingly to :

  • The applications executed by the PBS python script (os.system and subprocess)
  • Number of cpus used by applications (of course the number of cpus are requested in qsub and the same number of cpus are requested to abaqus)

Regards

Could you please try this script and check the cput:

#!/bin/bash
#PBS -W sandbox=private
#PBS -W stagein=‘.@10.88.46.132:/shared/home/saf112092/test/
#PBS -W stageout='
@10.88.46.132:/shared/home/saf112092/test/’
#PBS -A saf112092
#PBS -q workq
#PBS -l vm_size=Standard_D2ds_v5
cd $PBS_O_WORKDIR
/absolute/path/to/abq2022 job=my_job inter

or

try this stress application to check whether cput time is updated

I was able to activate the cput count by activating cgroups hook:

qmgr: set hook pbs_cgroups enabled = true

The VM Standard_D2ds_v5 in Azure is a one physical cpu hyperthreaded so 2 virtual cpus. I have to configure abaqus with a abaqus_v6.env file to force multiprocessing:
import os
import socket

mp_host_list=[[socket.gethostname(), 2],]
os.environ[‘ABA_CPUS_LOGICAL’]=‘1’
os.environ[‘ABA_BATCH_OVERRIDE’]=“1”

But now when I submit a job with #PBS -l select=1:ncpus=2:mem=2gb:vm_size=Standard_D2ds_v5, The MoM is rejecting but with #PBS -l select=1:ncpus=1:mem=2gb:vm_size=Standard_D2ds_v5,
The job runs and cput has correct value.

Now I need make Nodes to accept :
#PBS -l select=1:ncpus=2:mem=2gb:vm_size=Standard_D2ds_v5

1 Like

I read your post ‘MPI job shows running but with 00:00:00 time’ yesterday but unfortunately yum install stress -y does not work.

Your script will work as well as my test script both with and without pbs_cgroups enabled/disabled

I finally was able to install Stress:

It seams like epel must be install first:

sudo yum install -y epel-release
sudo yum install -y stress

I have installed this both on pbs server and node, restarted pbs but running the “normal” python script which execute application in child processes, cput keeps being 00:00:00.

This is the experiment i did and it works for me:


[pbsdata@rhel9 ~]$ date ; qsub -l select=1:ncpus=1   -- /bin/stress --cpu 2 --timeout 30
Tue 28 May 17:08:59 BST 2024
3.rhel9

[pbsdata@rhel9 ~]$ date ; qstat -answ1
Tue 28 May 17:09:06 BST 2024

rhel9: 
                                                                                                   Req'd  Req'd   Elap
Job ID                         Username        Queue           Jobname         SessID   NDS  TSK   Memory Time  S Time
------------------------------ --------------- --------------- --------------- -------- ---- ----- ------ ----- - -----
3.rhel9                        pbsdata         workq           STDIN              45022    1     1    --    --  R 00:00:00 rhel9/0
   Job run at Tue May 28 at 17:08 on (rhel9:ncpus=1)

[pbsdata@rhel9 ~]$ qstat

[pbsdata@rhel9 ~]$ qstat -fx 3 | grep cput
    resources_used.cput = 00:01:00

[pbsdata@rhel9 ~]$ qstat -fx 3 | grep cpu
    resources_used.cpupercent = 109
    resources_used.cput = 00:01:00
    resources_used.ncpus = 1
    exec_vnode = (rhel9:ncpus=1)
    Resource_List.ncpus = 1
    Resource_List.select = 1:ncpus=1
    comment = Job run at Tue May 28 at 17:08 on (rhel9:ncpus=1) and finished
    Submit_arguments = -l select=1:ncpus=1 -- /bin/stress --cpu 2 --timeout 30
    argument_list = <jsdl-hpcpa:Argument>--cpu</jsdl-hpcpa:Argument><jsdl-hpcpa

Hello Adarsh,

One year later I am back on configuring the HPC.

Thanks to your post I am able to reproduce your experiment with a pbs_script:
RG-238:

 -- <executable> [<arguments to executable>]


If a script or executable is specified, it must be the last argument to qsub. The arguments to an executable must follow the name of the executable.

qsub -- /usr/bin/stress -cpu 2 FeaJob.py
10.xxxx
qstat -xf -F json 10 | grep cpu
                "ncpus":1,
                "select":"1:ncpus=1:ungrouped=false",
            "schedselect":"1:ncpus=1:ungrouped=false",
            "Submit_arguments":"-N FeaJob.py -- /usr/bin/stress -cpu 2",
            "argument_list":"<jsdl-hpcpa:Argument>-cpu</jsdl-hpcpa:Argument><jsdl-hpcpa:Argument>2</jsdl-hpcpa:Argument>",
1 Like

Hello,

While qsub -- /usr/bin/stress -cpu 2 FeaJob.py submits a taks correctly, the execute node is not able to run the job. It seems that Stress think the pbs script is an option. Then how could I use the stress command? I also tried to add #PBS --/usr/bin/stress -cpu 2 in the pbs script but it does not work either.

Here is the sdterr:

stress: FAIL: [8950] (244) unrecognized option: FeaJob.py

Here is the mom log:

07/02/2025 15:34:43;0100;pbs_python;Hook;pbs_python;create_job: Creating directory /sys/fs/cgroup/systemd/pbs_jobs.service/jobid/24.ip-0A582E84/
07/02/2025 15:34:43;0100;pbs_python;Hook;pbs_python;create_job: Creating directory /sys/fs/cgroup/cpu,cpuacct/pbs_jobs.service/jobid/24.ip-0A582E84/
07/02/2025 15:34:43;0100;pbs_python;Hook;pbs_python;create_job: Creating directory /sys/fs/cgroup/cpuset/pbs_jobs.service/jobid/24.ip-0A582E84/
07/02/2025 15:34:43;0100;pbs_python;Hook;pbs_python;create_job: Creating directory /sys/fs/cgroup/memory/pbs_jobs.service/jobid/24.ip-0A582E84/
07/02/2025 15:34:43;0100;pbs_python;Hook;pbs_python;configure_job: mem not requested, assigning 268435456 to cgroup
07/02/2025 15:34:43;0100;pbs_python;Hook;pbs_python;Assigned resources: {'cpuset.cpus': [0], 'cpuset.mems': [0, 0], 'mem': 268435456}
07/02/2025 15:34:43;0008;pbs_python;Job;24.ip-0A582E84;update_job_usage: CPU percent: 0
07/02/2025 15:34:43;0008;pbs_python;Job;24.ip-0A582E84;update_job_usage: CPU usage: 0.000 secs
07/02/2025 15:34:43;0008;pbs_python;Job;24.ip-0A582E84;update_job_usage: Memory usage: mem=0b
07/02/2025 15:34:43;0100;pbs_python;Hook;pbs_python;Hook ended: pbs_cgroups, job ID 24.ip-0A582E84, event_type 64 (elapsed time: 0.0271)
07/02/2025 15:34:43;0100;pbs_python;Hook;pbs_python;main: Event type is execjob_launch, job ID is 24.ip-0A582E84
07/02/2025 15:34:43;0080;pbs_python;Hook;pbs_python;discover_gpus set to False because devices subsystem is disabled
07/02/2025 15:34:43;0100;pbs_python;Hook;pbs_python;Hook ended: pbs_cgroups, job ID 24.ip-0A582E84, event_type 2048 (elapsed time: 0.0263)
07/02/2025 15:34:43;0008;pbs_mom;Job;24.ip-0A582E84;Started, pid = 8950
07/02/2025 15:34:43;0080;pbs_mom;Job;24.ip-0A582E84;task 00000001 terminated
07/02/2025 15:34:43;0008;pbs_mom;Job;24.ip-0A582E84;Terminated
07/02/2025 15:34:43;0100;pbs_mom;Job;24.ip-0A582E84;task 00000001 cput=00:00:00
07/02/2025 15:34:43;0008;pbs_mom;Job;24.ip-0A582E84;kill_job
07/02/2025 15:34:43;0100;pbs_mom;Job;24.ip-0A582E84;ip-0A582E85 cput=00:00:00 mem=0kb
07/02/2025 15:34:43;0100;pbs_python;Hook;pbs_python;main: Event type is execjob_epilogue, job ID is 24.ip-0A582E84
07/02/2025 15:34:43;0080;pbs_python;Hook;pbs_python;discover_gpus set to False because devices subsystem is disabled
07/02/2025 15:34:43;0008;pbs_python;Job;24.ip-0A582E84;update_job_usage: CPU percent: 0
07/02/2025 15:34:43;0008;pbs_python;Job;24.ip-0A582E84;update_job_usage: CPU usage: 0.002 secs
07/02/2025 15:34:43;0008;pbs_python;Job;24.ip-0A582E84;update_job_usage: Memory usage: mem=308kb
07/02/2025 15:34:43;0100;pbs_python;Hook;pbs_python;_remove_cgroup: Removing directory /sys/fs/cgroup/systemd/pbs_jobs.service/jobid/24.ip-0A582E84
07/02/2025 15:34:44;0100;pbs_python;Hook;pbs_python;_remove_cgroup: Removing directory /sys/fs/cgroup/memory/pbs_jobs.service/jobid/24.ip-0A582E84
07/02/2025 15:34:44;0100;pbs_python;Hook;pbs_python;_remove_cgroup: Removing directory /sys/fs/cgroup/cpuset/pbs_jobs.service/jobid/24.ip-0A582E84
07/02/2025 15:34:44;0100;pbs_python;Hook;pbs_python;_remove_cgroup: Removing directory /sys/fs/cgroup/cpu,cpuacct/pbs_jobs.service/jobid/24.ip-0A582E84
07/02/2025 15:34:44;0100;pbs_python;Hook;pbs_python;Hook ended: pbs_cgroups, job ID 24.ip-0A582E84, event_type 256 (elapsed time: 0.4179)
07/02/2025 15:34:44;0008;pbs_mom;Job;24.ip-0A582E84;no active tasks
07/02/2025 15:34:44;0100;pbs_mom;Job;24.ip-0A582E84;Obit sent
07/02/2025 15:34:45;0100;pbs_mom;Req;;Type 54 request received from root@10.88.46.132:15001, sock=0
07/02/2025 15:34:45;0080;pbs_mom;Job;24.ip-0A582E84;copy file request received
07/02/2025 15:34:45;0100;pbs_mom;Job;24.ip-0A582E84;Staged 2/2 items out over 0:00:00
07/02/2025 15:34:45;0008;pbs_mom;Job;24.ip-0A582E84;no active tasks
07/02/2025 15:34:45;0100;pbs_mom;Req;;Type 6 request received from root@10.88.46.132:15001, sock=0
07/02/2025 15:34:45;0080;pbs_mom;Job;24.ip-0A582E84;delete job request received
07/02/2025 15:34:45;0100;pbs_python;Hook;pbs_python;main: Event type is execjob_end, job ID is 24.ip-0A582E84
07/02/2025 15:34:45;0080;pbs_python;Hook;pbs_python;discover_gpus set to False because devices subsystem is disabled
07/02/2025 15:34:45;0100;pbs_python;Hook;pbs_python;Hook ended: pbs_cgroups, job ID 24.ip-0A582E84, event_type 512 (elapsed time: 0.0082)
07/02/2025 15:34:45;0008;pbs_mom;Job;24.ip-0A582E84;no active tasks
07/02/2025 15:34:46;0008;pbs_mom;Job;24.ip-0A582E84;kill_job
07/02/2025 15:35:47;0080;pbs_python;Hook;pbs_python;discover_gpus set to False because devices subsystem is disabled

The stess command does not take python script as one of the input argument
Please refer: Linux stress command With Examples - GeeksforGeeks