Resources_used.cput = 00:00:00

Hello,

I am setting up a HPC Azure with OpenPBS 20.0.1 and CycleCloud.
When I run a pbs python script to execute abaqus jobs, the cput remains at 0.
At the end of the task cput ~ 00:00:02 which is the stageout time during the Exiting phase.

The script runs the abaqus job (os.system(abq job=…)), it monitors the abaqus job, providing some progress in the task status, do the post-processing (subprocess.run(abaqus viewer noGUI=my_post_process.py)).

Since the script is executing abaqus with os.system and abaqus viewer for post processing with subprocess. One in non blocking, the other create a child process. It looks like PBS is monitoring only the python process which is in sleep mode while waiting for abaqus applications to end.

The same script in PBSPro on a Windows Node, makes the cput to increase normally and accordingly to the ncpus, so there must be a way to make OpenPBS do the same.

When I execute a very simple pbs python script for testing, cput is increasing normally:

#!/shared/apps/Python3.12/bin/python3.12
#PBS -W sandbox=private
#PBS -W stagein=‘.@10.88.46.132:/shared/home/saf112092/test/
#PBS -W stageout='
@10.88.46.132:/shared/home/saf112092/test/’
#PBS -A saf112092
#PBS -q workq
#PBS -l vm_size=Standard_D2ds_v5

import os
print(‘running abaqus’)
os.system(‘abq2022 job=my_job inter’)

The ‘inter’ suffix make the abaqus command in blocking mode.

Long introduction for a simple question.
How to make OpenPBS to count the cput accordingly to :

  • The applications executed by the PBS python script (os.system and subprocess)
  • Number of cpus used by applications (of course the number of cpus are requested in qsub and the same number of cpus are requested to abaqus)

Regards

Could you please try this script and check the cput:

#!/bin/bash
#PBS -W sandbox=private
#PBS -W stagein=‘.@10.88.46.132:/shared/home/saf112092/test/
#PBS -W stageout='
@10.88.46.132:/shared/home/saf112092/test/’
#PBS -A saf112092
#PBS -q workq
#PBS -l vm_size=Standard_D2ds_v5
cd $PBS_O_WORKDIR
/absolute/path/to/abq2022 job=my_job inter

or

try this stress application to check whether cput time is updated

I was able to activate the cput count by activating cgroups hook:

qmgr: set hook pbs_cgroups enabled = true

The VM Standard_D2ds_v5 in Azure is a one physical cpu hyperthreaded so 2 virtual cpus. I have to configure abaqus with a abaqus_v6.env file to force multiprocessing:
import os
import socket

mp_host_list=[[socket.gethostname(), 2],]
os.environ[‘ABA_CPUS_LOGICAL’]=‘1’
os.environ[‘ABA_BATCH_OVERRIDE’]=“1”

But now when I submit a job with #PBS -l select=1:ncpus=2:mem=2gb:vm_size=Standard_D2ds_v5, The MoM is rejecting but with #PBS -l select=1:ncpus=1:mem=2gb:vm_size=Standard_D2ds_v5,
The job runs and cput has correct value.

Now I need make Nodes to accept :
#PBS -l select=1:ncpus=2:mem=2gb:vm_size=Standard_D2ds_v5

1 Like

I read your post ‘MPI job shows running but with 00:00:00 time’ yesterday but unfortunately yum install stress -y does not work.

Your script will work as well as my test script both with and without pbs_cgroups enabled/disabled

I finally was able to install Stress:

It seams like epel must be install first:

sudo yum install -y epel-release
sudo yum install -y stress

I have installed this both on pbs server and node, restarted pbs but running the “normal” python script which execute application in child processes, cput keeps being 00:00:00.

This is the experiment i did and it works for me:


[pbsdata@rhel9 ~]$ date ; qsub -l select=1:ncpus=1   -- /bin/stress --cpu 2 --timeout 30
Tue 28 May 17:08:59 BST 2024
3.rhel9

[pbsdata@rhel9 ~]$ date ; qstat -answ1
Tue 28 May 17:09:06 BST 2024

rhel9: 
                                                                                                   Req'd  Req'd   Elap
Job ID                         Username        Queue           Jobname         SessID   NDS  TSK   Memory Time  S Time
------------------------------ --------------- --------------- --------------- -------- ---- ----- ------ ----- - -----
3.rhel9                        pbsdata         workq           STDIN              45022    1     1    --    --  R 00:00:00 rhel9/0
   Job run at Tue May 28 at 17:08 on (rhel9:ncpus=1)

[pbsdata@rhel9 ~]$ qstat

[pbsdata@rhel9 ~]$ qstat -fx 3 | grep cput
    resources_used.cput = 00:01:00

[pbsdata@rhel9 ~]$ qstat -fx 3 | grep cpu
    resources_used.cpupercent = 109
    resources_used.cput = 00:01:00
    resources_used.ncpus = 1
    exec_vnode = (rhel9:ncpus=1)
    Resource_List.ncpus = 1
    Resource_List.select = 1:ncpus=1
    comment = Job run at Tue May 28 at 17:08 on (rhel9:ncpus=1) and finished
    Submit_arguments = -l select=1:ncpus=1 -- /bin/stress --cpu 2 --timeout 30
    argument_list = <jsdl-hpcpa:Argument>--cpu</jsdl-hpcpa:Argument><jsdl-hpcpa