Hello People,
I have a problem with Job dependencies, when the submission host ( host where qub is executed ) is not the same as the server host ( host where pbs_server is running).
I am using openPBS version 19.1.3 installed by openHPC version 1.3 on a RHEL7.7.
Testcases:
Test1: when I submit job on submission host without dependency to other job it works well
Test2: when I submit jobs on server host with dependency like afterany to parent job, it works well
Test3: when I submit jobs on submission host with dependency like afterany to parent job, it does not work. Child job remain in Hold status forever, after parent exit.
Question: I there any known bug with this PBS Version ?
Thanks
Dependency issues are often resolved by using the full jobid in the -W depend= string. That is, when you qsub a job, qsub outputs the jobid. Copy/paste that exact string when forming your depend= values.
If this is not the fix, could you show the dependency information from an example pair of jobs?
qstat -f running_jobid held_jobid | grep depend
H dtalcott, your tip with full jobid fix my problem.
-
only jobid
[subhost: ~ ]$ qstat -f 17 18 | grep -i depend
depend = beforeany:18.serverhost@serverhost
depend = afterany:17.serverhost.fev.com@serverhost.fev.com
Submit_arguments = -I -l select=1:ncpus=1 -W depend=afterany:17
-
jobid + servername where PBS server is running
[subhost: ~ ]$ qstat -f 19 20 | grep -i depend
depend = beforeany:20.serverhost@serverhost
depend = afterany:19.serverhost@serverhost
Submit_arguments = -I -l select=1:ncpus=1 -W depend=afterany:19.serverhost
The difference is the domain name that is automatically attached at the end of the child job. Remark that if I also add the domain name in the afterany clausel ( afterany:19.serverhost.fev.com) , then it does not work. What is the logic ?
Thanks you very much
You probably have a hostname/IP address/pbs.conf mismatch somewhere. Carefully go through the items listed under “Required Name Resolution” and following sections in the Installation and Upgrade Guide. Be sure to use the canonical names in the /etc/pbs.conf files.