polpo
May 25, 2020, 5:34am
1
Hi
It may not transition from the “Q” state to the “R” state.
If you submit more than one job, it runs smoothly at first.
However, when a job builds up in the queue to some extent, the transition from the “Q” state to the “R” state is no longer possible.
The pbs_schd log said the following.
Job is invalid - ignoring for this cycle No Group
Rebooting the service and OS in this state will not result in a transition to “R”.
Also, when a new job is submitted after this state, the queued job is overtaken and becomes “R”.
Can someone please tell me the cause and how to deal with i?
Please let me know if you have any more information I need.
OS: Windows 10
PBS: 18.1.4
adarsh
May 26, 2020, 11:19pm
2
Please share the server logs and mom logs from the compute node that it was suppose to run.
Please share the job script and qsub command /arguments/attribute used
Is UAC and firewall turned off in the windows registry and system rebooted ?
Please share the output of pbsnodes -av and
qstat -ans after submitting the job
Thank you
polpo
May 27, 2020, 6:12am
3
Thank you
Please share the server logs and mom logs from the compute node that it was suppose to run.
Do you have a place to upload?
There are about 4,000 lines and I couldn’t write it.
Please share the job script and qsub command /arguments/attribute used
Sorry, I can’t attach the script due to circumstances.
qsub:qsub -q execN -N job_name – “cmd.exe /c cd exec_dir && exec.bat”
Is UAC and firewall turned off in the windows registry and system rebooted ?
No, I didn’t.
pbsnodes -av
xxxxxhostN
Mom = xxxxxhostN.xxxx.xxxx.xx.xx
Port = 15002
pbs_version = 18.1.4
ntype = PBS
state = free
pcpus = 4
resources_available.arch = windows
resources_available.host = xxxxhostN
resources_available.mem = 8388148kb
resources_available.ncpus = 4
resources_available.vnode = xxxxhostN
resources_assigned.accelerator_memory = 0kb
resources_assigned.hbmem = 0kb
resources_assigned.mem = 0kb
resources_assigned.naccelerators = 0
resources_assigned.ncpus = 0
resources_assigned.vmem = 0kb
resv_enable = True
sharing = default_shared
last_state_change_time = Wed May 27 12:17:49 2020
last_used_time = Wed May 27 13:14:27 2020
qstat -ans
xxxxxhostN:
Req’d Req’d Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
672.xxxxxhostN xxxxxUs execN xxx999991_ – 1 1 – – Q –
–
–
673.xxxxxhostN xxxxxUs execN xxx999992_ – 1 1 – – Q –
–
–
674.xxxxxhostN xxxxxUs execN xxx999993_ – 1 1 – – Q –
–
–
675.xxxxxhostN xxxxxUs execN xxx999994_ – 1 1 – – Q –
–
–
676.xxxxxhostN xxxxxUs execN xxx999995_ – 1 1 – – Q –
–
–
Thank you very much for sharing these details.
Please check the server and mom logs and see whether you find anything unusual/errors etc.
Sorry, understand the size of the file and upload is the problem.
Please test a simple job by submitting the below line
qsub -- pbs-sleep 10
Yes UAC, firewall, permissions needs to be checked. Please check the pre-requisite guide for Windows from this guide https://www.altair.com/pdfs/pbsworks/PBSInstallGuide19.2.3.pdf
polpo
May 29, 2020, 12:28pm
5
Thank you.
I have reviewed the settings and the same phenomenon has occurred again.
qsub - pbs-sleep 10
the state that it does not transition to Q is gone.
Is there such a thing as resources not being released depending on the command you submit?
Somehow, I feel that the same phenomenon has occurred when CPU and memory are tight.
Thank you very much.