Schedulers doesn't seem to be holding jobs

I have a cluster, 28 nodes, 40 cores per node for a total of 1120 cores.

User 1 submit a job that asks for 1000 cores
user 2 submits a job that ask for 400 cores while User 1’s job is running
User 2 doesn’t get queued up, instead it tries to run and aborts.
Am I missing a setting or should PBS queue up the job until resources are available?
Thanks.

Are you sure that HT is not enabled on these nodes ?
Please share the below:

  1. job script or qsub statement used by User1 and User2
  2. qstat -fx <jobid of user1> and qstat -fx <jobid user2>
  3. pbsnodes -aSj command output
  4. Submit User1 job and then when it is running, run the below scripts and share the output
    pbsnodes -av | grep resources_available.ncpus | cut -d’=’ -f2|awk ‘{ sum+=$1} END {print sum}’
    pbsnodes -av | grep resources_assigned.ncpus | cut -d’=’ -f2|awk ‘{ sum+=$1} END {print sum}’

Neither of the users is around today, so I can’t get them to submit their jobs, but I noticed something in the user2’s job that looks odd:

#PBS -l select=20:ncpus=40:mpiprocs=40,walltime=1:00:00

Users runs jobs at anytime just fine. I don’t know where his job submission script is. User2 can run fine if no one else is running. But if user2 submit his job while others are running it immediately aborts. User1 is using 1000 of the 1120 cores, so User2’s job should be held, but I don’t think he has constructed his submission script correctly.

Thank you please share the information whenever possible.

The job request is correct, user2 is requesting 800 cores with a job walltime of 1 hour.

Could you please share the PBS Pro OSS version you are running ? ( e.g., qstat --version )

Here is the version:

[ramos@sandy1 ~] qstat --version pbs_version = 14.1.2 [ramos@sandy1 ~]

I was thinking the line should be:

#PBS -l select=20:ppn=40,walltime=1:00:00

Thank you

ppn=40 is a old syntax was used in PBS Pro version 9 and before .
select and ncpus are correct here.

PBS converts old-style resource requests to select and place statements.
See the 19.2.1 UG, section 4.8.3, “Conversion of Old Style to New”, on page UG-72

Thank you. I want to rule out a format issue. Still the weekend, so no users are in, but here is a job that immediately fails:

more job-15170
06/14/2019 15:26:34 L Considering job to run
06/14/2019 15:26:34 S Job Queued at request of viner@sandy1.local, owner = viner@sandy1.local, job name = ColdForecast.20, queue = workq
06/14/2019 15:26:34 S Job Run at request of Scheduler@smaster1 on exec_vnode
(compute-0-13:ncpus=40)+(compute-0-14:ncpus=40)+(compute-0-15:ncpus=40)+(compute-0-16:ncpus=40)+(compute-0-17:ncpus=40)+(compute-0-8:ncpus=40)+(compute-0-18:ncpus=
40)+(compute-0-19:ncpus=40)+(compute-0-20:ncpus=40)+(compute-0-21:ncpus=40)
06/14/2019 15:26:34 S Job Modified at request of Scheduler@smaster1
06/14/2019 15:26:34 L Job run
06/14/2019 15:26:34 S enqueuing into workq, state 1 hop 1
06/14/2019 15:26:34 S Obit received momhop:1 serverhop:1 state:4 substate:42
06/14/2019 15:26:34 A queue=workq
06/14/2019 15:26:34 A user=viner group=viner account=“None” project=_pbs_project_default jobname=ColdForecast.20 queue=workq ctime=1560551194 qtime=1560551194 etime=1560551194
start=1560551194
exec_host=compute-0-13/040+compute-0-14/040+compute-0-15/040+compute-0-16/040+compute-0-17/040+compute-0-8/040+compute-0-18/040+compute-0-19/040+compute-0-
20/040+compute-0-21/040
exec_vnode=(compute-0-13:ncpus=40)+(compute-0-14:ncpus=40)+(compute-0-15:ncpus=40)+(compute-0-16:ncpus=40)+(compute-0-17:ncpus=40)+(compute-0-8:ncpus=40)+(compute-
0-18:ncpus=40)+(compute-0-19:ncpus=40)+(compute-0-20:ncpus=40)+(compute-0-21:ncpus=40)
Resource_List.mpiprocs=400 Resource_List.ncpus=400 Resource_List.nodect=10 Resource_List.place=free Resource_List.select=10:ncpus=40:mpiprocs=40
Resource_List.walltime=01:00:00 resource_assigned.ncpus=400
06/14/2019 15:26:35 S Exit_status=1 resources_used.cpupercent=0 resources_used.cput=00:00:00 resources_used.mem=412kb resources_used.ncpus=400 resources_used.vmem=12864kb
resources_used.walltime=00:00:00
06/14/2019 15:26:35 A user=viner group=viner account=“None” project=_pbs_project_default jobname=ColdForecast.20 queue=workq ctime=1560551194 qtime=1560551194 etime=1560551194
start=1560551194
exec_host=compute-0-13/040+compute-0-14/040+compute-0-15/040+compute-0-16/040+compute-0-17/040+compute-0-8/040+compute-0-18/040+compute-0-19/040+compute-0-
20/040+compute-0-21/040
exec_vnode=(compute-0-13:ncpus=40)+(compute-0-14:ncpus=40)+(compute-0-15:ncpus=40)+(compute-0-16:ncpus=40)+(compute-0-17:ncpus=40)+(compute-0-8:ncpus=40)+(compute-
0-18:ncpus=40)+(compute-0-19:ncpus=40)+(compute-0-20:ncpus=40)+(compute-0-21:ncpus=40)
Resource_List.mpiprocs=400 Resource_List.ncpus=400 Resource_List.nodect=10 Resource_List.place=free Resource_List.select=10:ncpus=40:mpiprocs=40
Resource_List.walltime=01:00:00 session=121410 end=1560551195 Exit_status=1 resources_used.cpupercent=0 resources_used.cput=00:00:00 resources_used.mem=412kb
resources_used.ncpus=400 resources_used.vmem=12864kb resources_used.walltime=00:00:00 run_count=1

[root@smaster1 tmp]#

Please note that the exit status is 1 ( which means the application batch command that the user intended to run via the PBS Script failed ) , hence job has exited . In this scenario, could you please test whether you can run the user script without using PBS Pro on the compute nodes and check whether it executes successfully.

Thank you for sharing the information.

You can print the script the user has submitted by running the below command as a root user
printjob -s <jobid>

The script will run, if it is the only job running. that is the root of the problem. The script will run by itself, but not if the other user is running his. Here is the mom log of one of the compute nodes:

06/14/2019 15:26:34;0008;pbs_mom;Job;15170.smaster1;nprocs: 488, cantstat: 0, nomem: 0, skipped: 0, cached: 0, max excluded PID: 0
06/14/2019 15:26:34;0008;pbs_mom;Job;15170.smaster1;Started, pid = 121410
06/14/2019 15:26:34;0080;pbs_mom;Job;15170.smaster1;task 00000001 terminated
06/14/2019 15:26:34;0008;pbs_mom;Job;15170.smaster1;Terminated
06/14/2019 15:26:34;0100;pbs_mom;Job;15170.smaster1;task 00000001 cput= 0:00:00
06/14/2019 15:26:34;0008;pbs_mom;Job;15170.smaster1;kill_job
06/14/2019 15:26:34;0100;pbs_mom;Job;15170.smaster1;compute-0-13 cput= 0:00:00 mem=412kb
06/14/2019 15:26:34;0100;pbs_mom;Job;15170.smaster1;compute-0-14.local cput= 0:00:00 mem=0kb
06/14/2019 15:26:34;0100;pbs_mom;Job;15170.smaster1;compute-0-15.local cput= 0:00:00 mem=0kb
06/14/2019 15:26:34;0100;pbs_mom;Job;15170.smaster1;compute-0-16.local cput= 0:00:00 mem=0kb
06/14/2019 15:26:34;0100;pbs_mom;Job;15170.smaster1;compute-0-17.local cput= 0:00:00 mem=0kb
06/14/2019 15:26:34;0100;pbs_mom;Job;15170.smaster1;compute-0-8.local cput= 0:00:00 mem=0kb
06/14/2019 15:26:34;0100;pbs_mom;Job;15170.smaster1;compute-0-18.local cput= 0:00:00 mem=0kb
06/14/2019 15:26:34;0100;pbs_mom;Job;15170.smaster1;compute-0-19.local cput= 0:00:00 mem=0kb
06/14/2019 15:26:34;0100;pbs_mom;Job;15170.smaster1;compute-0-20.local cput= 0:00:00 mem=0kb
06/14/2019 15:26:34;0100;pbs_mom;Job;15170.smaster1;compute-0-21.local cput= 0:00:00 mem=0kb
06/14/2019 15:26:34;0008;pbs_mom;Job;15170.smaster1;no active tasks
06/14/2019 15:26:34;0100;pbs_mom;Job;15170.smaster1;Obit sent

Thank you for the mom logs, it does not say anything as job is not using any cput or walltime and is immediately exiting with Exit_Status = 1

Could you please submit the below and share the output:

qsub -l select=20:ncpus=40    --  /bin/sleep 10000
qsub -l select=20:ncpus=40    --  /bin/sleep 10000
qsub -l select=20:ncpus=40    --  /bin/sleep 10000
qstat -answ1
pbsnode -aSj

If this fails, then you have to share us pbs_diag output by running $PBS_EXEC/unsupported/pbs_diag -j <failed jobid> , the output of this command is a tar.gz file stored in the /root/ folder.

This ran as expected:

15223.smaster1 STDIN ramos 00:00:00 R workq
15224.smaster1 STDIN ramos 0 Q workq
15225.smaster1 STDIN ramos 0 Q workq

Thank you. There are no issues with the cluster configuration. It is to do with the job script that user is submitting, which is failing instantly. If you can share the job script (removing any classifieds) that is failing immediately we can check.