I am setting up a HPC Azure with OpenPBS 20.0.1 and CycleCloud.
When I run a pbs python script to execute abaqus jobs, the cput remains at 0.
At the end of the task cput ~ 00:00:02 which is the stageout time during the Exiting phase.
The script runs the abaqus job (os.system(abq job=…)), it monitors the abaqus job, providing some progress in the task status, do the post-processing (subprocess.run(abaqus viewer noGUI=my_post_process.py)).
Since the script is executing abaqus with os.system and abaqus viewer for post processing with subprocess. One in non blocking, the other create a child process. It looks like PBS is monitoring only the python process which is in sleep mode while waiting for abaqus applications to end.
The same script in PBSPro on a Windows Node, makes the cput to increase normally and accordingly to the ncpus, so there must be a way to make OpenPBS do the same.
When I execute a very simple pbs python script for testing, cput is increasing normally:
I was able to activate the cput count by activating cgroups hook:
qmgr: set hook pbs_cgroups enabled = true
The VM Standard_D2ds_v5 in Azure is a one physical cpu hyperthreaded so 2 virtual cpus. I have to configure abaqus with a abaqus_v6.env file to force multiprocessing:
import os
import socket
But now when I submit a job with #PBS -l select=1:ncpus=2:mem=2gb:vm_size=Standard_D2ds_v5, The MoM is rejecting but with #PBS -l select=1:ncpus=1:mem=2gb:vm_size=Standard_D2ds_v5,
The job runs and cput has correct value.
Now I need make Nodes to accept : #PBS -l select=1:ncpus=2:mem=2gb:vm_size=Standard_D2ds_v5
I have installed this both on pbs server and node, restarted pbs but running the “normal” python script which execute application in child processes, cput keeps being 00:00:00.