Jobs running on only 1 cpu per node

Current Torque user, looking at using PBSPro as part of the OpenHPC stack on a new HPC. I have PBSPro 14.1.2 installed on head node with 4 separate compute nodes.

I don’t typically run parallel processing jobs, but run many separate monte carlo iterations concurrently, one job per core. When attempting to run jobs this way with PBSPro, I am having a problem. Everything appears ok when I use the monitoring tools (please see output below), i.e. the queue is receiving jobs, jobs are runnning, each job appears to be allocated to a separate cpu, output is produced, etc. However, my job sets are taking a very long time to run. Looking into the long run time, I realized that if I ssh to a compute node and use the top command to monitor cpu usage, all jobs are running on only 1 cpu, serially. (all other cpus are at 0%)

I’m not sure if this is a problem with my PBS setup or perhaps the way I am requesting resources? I have tried many combinations but currently I have settled on #PBS -l select=1:ncpus=1 to run jobs. I have also tried adding -l place=scatter thinking it could be a placement issue, but that didn’t help. I’m confused since PBSPro seemingly sees all 48 cpus per node and reports that it has allocated 12 cpus (1 for each of 12 jobs in this example case), but is really only running the jobs on 1 cpu.

the following outputs are from a case where I am trying to run 12 of these 1 core jobs concurrently on one node using the default workq:

Output of qstat -Q:

Queue              Max   Tot Ena Str   Que   Run   Hld   Wat   Trn   Ext Type
---------------- ----- ----- --- --- ----- ----- ----- ----- ----- ----- ----
workq                0    12 yes yes     0    12     0     0     0     0 Exec

Output of qstat -fB:

Server: athena
    server_state = Active
    server_host = athena
    scheduling = True
    total_jobs = 5862
    state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:12 Exiting:0 Begu
    default_queue = workq
    log_events = 511
    mail_from = adm
    query_other_jobs = True
    resources_default.ncpus = 1 = scatter
    default_chunk.ncpus = 1
    resources_assigned.ncpus = 12
    resources_assigned.nodect = 12
    scheduler_iteration = 600
    FLicenses = 2000000
    resv_enable = True
    node_fail_requeue = 310
    max_array_size = 10000
    default_qsub_arguments = -V
    pbs_license_min = 0
    pbs_license_max = 2147483647
    pbs_license_linger_time = 31536000
    license_count = Avail_Global:1000000 Avail_Local:1000000 Used:0 High_Use:0 
	Avail_Sockets:1000000 Unused_Sockets:1000000
    pbs_version = 14.1.2
    eligible_time_enable = False
    job_history_enable = True
    max_concurrent_provision = 5

Could this be related to my resource/vnode definitions? I simply created nodes with “create node n001” in qmgr.

Output of pbsnodes n001:

     Mom = n001.localdomain
     Port = 15002
     pbs_version = 14.1.2
     ntype = PBS
     state = free
     pcpus = 48
     jobs = 5891.athena/0, 5892.athena/1, 5893.athena/2, 5894.athena/3, 
5895.athena/4, 5896.athena/5, 5897.athena/6, 5898.athena/7, 5899.athena/8, 
5900.athena/9, 5901.athena/10, 5902.athena/11
     resources_available.arch = linux = n001
     resources_available.mem = 528278028kb
     resources_available.ncpus = 48
     resources_available.vnode = n001
     resources_assigned.accelerator_memory = 0kb
     resources_assigned.mem = 0kb
     resources_assigned.naccelerators = 0
     resources_assigned.ncpus = 12
     resources_assigned.netwins = 0
     resources_assigned.vmem = 0kb
     resv_enable = True
     sharing = default_shared

I would be grateful for any insight into this, Thanks.

  1. Your configuration and settings are correct
    Please try this and you will see 48 single core jobs running on the compute node
  • for i in {1…48};do qsub -l select=1:ncpus=1:mem=1mb – /bin/sleep 1000 ; done
  • qstat -answ1
  1. you have node n001 with 48 (phsyical cores == pcpus ) and resources_available.ncpus = 48 is the available cores on that node.

  2. resources_assigned.ncpus = 12 states that 12 cores have been used [0-11] as per your jobs listing on that node

If your 1 cpu jobs are not cpu intensive and are short spanned, then you would not see any CPU load on the compute node. The situation would be the same if you run manually, that one cpu job(s) on that compute node without using PBS Pro. For 1 cpu job, you do not have to use -l place=scatter ( use it only for multi-node multi-cpu jobs, MPI jobs , if you want chunks from different nodes )

For monte carlo simulations, you need to use Job Arrays :
Please refer - Page number UG-147 from the PBS Users Guide

example: qstat -t    / qstat -J  / qstat -p  for monitoring job arrays 
qsub -J 1-10  --   /bin/sleep 1000
qsub -J 1-10:2  -- /bin/sleep 1000
qsub -J 2-10:2  -- /bin/sleep 1000

All monte carlo simulations , rendering, bio science applications use job arrays

Thank you for the reply. Your suggestions very much helped me to figure this out. My jobs are cpu intensive and usually run for 5-10 minutes, so I should see cpu load in the monitoring tools. If I submit jobs via qsub on the command line as you suggested, they do run on all cpus on the compute nodes as expected. However, when I submit jobs with my scripts and .pbs files, they were still only running on 1 cpu (Actually 2- the first physical cpu and corresponding logical cpu). Upon more detailed examination of the .pbs files, I noticed that the calls to the executable were preceded by mpiexec -np 1. If I take that part out, the .pbs jobs are able to run on all cpus concurrently. The mpiexec precursor is a holdover from scripts developed in the Torque environment, so I suspect it is not necessary, especially for 1 cpu jobs.

Thank you also for the suggestion about using job arrays. I will experiment with them and post if I have any questions. Thank you again for the prompt reply and help.