I’ve been using OpenPBS 19.1.1 since a long time building Azure HPC solutions without issues. Recently we started to use Azure DNS Private DNS zone that our scheduler VM belongs to. This has introduced issues in the communication resulting in errors like “invalid credential” when running qstat or error when launching qmgr, whatever it is run from the scheduler VM or a login node.
Our setting is using shortname for the scheduler (named scheduler).
After some analysis it appears that due to Azure DNS Private DNS zone there are two records to resolve the IP address of the scheduler, and not always returned in the same order like shown below.
[root@scheduler server_logs]# nslookup 10.174.0.21
21.0.174.10.in-addr.arpa name = scheduler.internal.cloudapp.net.
21.0.174.10.in-addr.arpa name = scheduler.hpc.azure.Authoritative answers can be found from:
[root@scheduler server_logs]# nslookup 10.174.0.21
21.0.174.10.in-addr.arpa name = scheduler.hpc.azure.
21.0.174.10.in-addr.arpa name = scheduler.internal.cloudapp.net.Authoritative answers can be found from:
When looking at the server_logs it appears that some connections are made with theinternal.cloudapp.net
domain and others fromhpc.azure
domain which will be rejected as shown below.
12/28/2022 18:16:57;0040;Server@scheduler;Svr;scheduler.hpc.azure;Scheduler sent command 3
12/28/2022 18:16:57;0040;Server@scheduler;Svr;scheduler.hpc.azure;Scheduler sent command 0
12/28/2022 18:16:57;0100;Server@scheduler;Req;;Type 21 request received from Scheduler@scheduler.internal.cloudapp.net, sock=16
12/28/2022 18:16:57;0100;Server@scheduler;Req;;Type 81 request received from Scheduler@scheduler.internal.cloudapp.net, sock=16
12/28/2022 18:16:57;0100;Server@scheduler;Req;;Type 71 request received from Scheduler@scheduler.internal.cloudapp.net, sock=16
12/28/2022 18:16:57;0100;Server@scheduler;Req;;Type 58 request received from Scheduler@scheduler.internal.cloudapp.net, sock=16
12/28/2022 18:16:57;0080;Server@scheduler;Req;req_reject;Reject reply code=15064, aux=0, type=58, from Scheduler@scheduler.internal.cloudapp.net
12/28/2022 18:17:12;0040;Server@scheduler;Svr;scheduler.hpc.azure;Scheduler sent command 3
12/28/2022 18:17:12;0040;Server@scheduler;Svr;scheduler.hpc.azure;Scheduler sent command 0
12/28/2022 18:17:12;0100;Server@scheduler;Req;;Type 21 request received from Scheduler@scheduler.internal.cloudapp.net, sock=16
12/28/2022 18:17:12;0100;Server@scheduler;Req;;Type 81 request received from Scheduler@scheduler.internal.cloudapp.net, sock=16
12/28/2022 18:17:12;0100;Server@scheduler;Req;;Type 71 request received from Scheduler@scheduler.internal.cloudapp.net, sock=16
12/28/2022 18:17:12;0100;Server@scheduler;Req;;Type 58 request received from Scheduler@scheduler.internal.cloudapp.net, sock=16
12/28/2022 18:17:12;0080;Server@scheduler;Req;req_reject;Reject reply code=15064, aux=0, type=58, from Scheduler@scheduler.internal.cloudapp.net
I’ve tried to use scheduler.hpc.azure in the pbs.conf without sucess.
How to always use shortname instead of FQDN in that case or force to use the same FQDN for all communications ?
Thank you
Xavier Pillons
Principal Technical Program Manager
Azure Specialized Workloads HPC/AI - Customer Solutions and Incubation
Microsoft Corporation