I recently installed OpenPBS version 20.0.0 and am currently trying to overcome a problem with scheduling my submitted jobs. I was able to create a queue (named batch), and submit jobs to this queue, however, the jobs will stay queued unless I force them to run with the “qrun” command as root user. I did check my system requirements, and the problem is not arising from this. I am seeking someone’s help to determine what could be done to fix this issue. My overall goal is to send jobs to the queue and have the job scheduler complete these jobs as system requirements become available. Please find below some details regarding my system.
If qrun without specifying nodes runs the job, then a special scheduler iteration runs the job. That lets the scheduler ignore the state of the queue (is it started?) and any limits.
As someone else said: crank up the log events and look at the scheduler logs, and also look at the the job comment which is normally set by the scheduler when it considers a job but does not run it.
If there is none the scheduler hasn’t considered it…perhaps the server “scheduling” attribute is set to false.
(base) [modeleval@cc-3dfr ~]$ tracejob 2537
Job: 2537.cc-3dfr
12/29/2022 15:05:51 S enqueuing into batch, state Q hop 1
12/29/2022 15:05:51 S Job Queued at request of modeleval@cc-3dfr, owner = modeleval@cc-3dfr, job name = STDIN, queue = batch
Additionally, here are the logs against the submitted job ID’s (2537 and 2538): Scheduler_log:
12/29/2022 14:59:12;0002;pbs_sched;Svr;Log;Log opened
12/29/2022 14:59:12;0002;pbs_sched;Svr;pbs_sched;pbs_version=20.0.0
12/29/2022 14:59:12;0002;pbs_sched;Svr;pbs_sched;pbs_build=mach=N/A:security=N/A:configure_args=N/A
12/29/2022 14:59:12;0002;pbs_sched;Svr;pbs_sched;hostname=cc-3dfr;pbs_leaf_name=N/A;pbs_mom_node_name=N/A
12/29/2022 14:59:12;0002;pbs_sched;Svr;pbs_sched;ipv4 interface lo: cc-3dfr.bcrc.local
12/29/2022 14:59:12;0002;pbs_sched;Svr;pbs_sched;ipv4 interface eno1: cc-3dfr.bcrc.local
12/29/2022 14:59:12;0002;pbs_sched;Svr;pbs_sched;ipv4 interface virbr0: cc-3dfr
12/29/2022 14:59:12;0002;pbs_sched;Svr;pbs_sched;ipv6 interface lo: cc-3dfr.bcrc.local
12/29/2022 14:59:12;0002;pbs_sched;Svr;pbs_sched;ipv6 interface eno1: cc-3dfr
12/29/2022 14:59:12;0002;pbs_sched;n/a;setup_env;read environment from /var/spool/pbs/pbs_environment
12/29/2022 14:59:12;0006;pbs_sched;Fil;pbs_sched;Version 20.0.0, started, initialization type = 0
12/29/2022 14:59:12;0002;pbs_sched;Svr;sched_main;/opt/pbs/sbin/pbs_sched startup pid 438795
12/29/2022 14:59:12;0040;pbs_sched;Fil;fairshare usage;Creating usage database for fairshare
12/29/2022 14:59:12;0080;pbs_sched;Req;;Launching 12 worker threads
12/29/2022 14:59:16;0001;pbs_sched;Svr;pbs_sched;Access from host not allowed, or unknown host (15008) in open_server_conns, Couldn’t register the scheduler default with connected server
Server_Log
12/29/202215:05:51;0100;Server@cc3dfr;Job;2537.cc3dfr;enqueuing mailto: 15:05:51;0100;Server@cc-3dfr;Job;2537.cc-3dfr;enqueuing into batch, state Q hop 1
12/29/2022 15:05:51;0008;Server@cc-3dfr;Job;2537.cc-3dfr;Job mailto: 15:05:51;0008;Server@cc-3dfr;Job;2537.cc-3dfr;Job Queued at request of modeleval@cc-3dfr, owner = modeleval@cc-3dfr, job name = STDIN, queue = batch
12/29/2022 15:05:53;0100;Server@cc-3dfr;Req;;Type 0 request received from root@cc-3dfr, sock=19
12/29/2022 15:05:53;0100;Server@cc-3dfr;Req;;Type 95 request received from root@cc-3dfr, sock=20
12/29/2022 15:05:53;0100;Server@cc-3dfr;Req;;Type 98 request received from root@cc-3dfr, sock=19
12/29/2022 15:05:53;00a0;Server@cc-3dfr;Req;req_reject;Reject reply code=15008, aux=0, type=98, from root@cc-3dfr
12/29/2022 15:05:55;0100;Server@cc-3dfr;Req;;Type 0 request received from root@cc-3dfr, sock=19
12/29/2022 15:05:55;0100;Server@cc-3dfr;Req;;Type 1 request received from modeleval@cc-3dfr, sock=17
12/29/2022 15:05:55;0100;Server@cc-3dfr;Job;2538.cc-3dfr;enqueuing(mailto: 15:05:55;0100;Server@cc-3dfr;Job;2538.cc-3dfr;enqueuing) into batch, state Q hop 1
12/29/2022 [15:05:55;0008;Server@cc-3dfr;Job;2538.cc-3dfr;Job](mailto: 15:05:55;0008;Server@cc-3dfr;Job;2538.cc-3dfr;Job) Queued at request of modeleval@cc-3dfr, owner = modeleval@cc-3dfr, job name = STDIN, queue = batch
12/29/2022 15:05:55;0100;Server@cc-3dfr;Req;;Type 95 request received from root@cc-3dfr, sock=20
12/29/2022 15:05:55;0100;Server@cc-3dfr;Req;;Type 98 request received from root@cc-3dfr, sock=19
12/29/2022 15:05:55;00a0;Server@cc-3dfr;Req;req_reject;Reject reply code=15008, aux=0, type=98, from root@cc-3dfr
12/29/2022 15:05:57;0100;Server@cc-3dfr;Req;;Type 0 request received from root@cc-3dfr, sock=19
12/29/2022 15:05:57;0100;Server@cc-3dfr;Req;;Type 95 request received from root@cc-3dfr, sock=20
12/29/2022 15:05:57;0100;Server@cc-3dfr;Req;;Type 98 request received from root@cc-3dfr, sock=19
Please know that we had to perform an upgrade of the system and in the process I had to reinstall OpenPBS and PostgreSQL. Please see a new thread entitled “Difficulty with installation”; and please possibly provide advice on how to proceed from there.