Setting up Infiniband on PBS Pro 14.1.0 with ANSYS Fluent

Hi,

I try to setup a cluster in Microsoft Azure with Infiniband.
When the headnode and compute nodes are done with the scripts the Infiniband is working if it is tested on the nodes itself (e.g. it can do a pingpong command among the hosts).
However, when I try to submit a job from an ANSYS workstation the following takes place:

  1. ANSYS starts the job (fluent -r17.2.0 3ddp -pinfiniband -mpi=intel -node -t30 -nodehomedir=/mnt/nfsshare -pbs -mport 10.224.57.21:10.224.57.19:55826:0).
  2. The command is transformed to a qsub command and ncpus are reserved as expected
  3. According to qstat the job is running. (Status R)

But then, nothing happens. There’s no feedback to Fluent and when I login to the compute nodes, there are no processes started.
The log files aren’t very helpful neither since there are no errors or timeouts or whatsoever.
Sometimes one of the compute nodes locks up. There’s a 100% cpu usage (which is shown in MS Azure Portal). I’m unable to login to that node.

My question is: Are there any directives/variables to setup Infiniband to work with PBS? The documentation of 13.2 states that it should be detected automatically otherwise change pbs_mom_ib to pbs_mom but there is no pbs_mom_ib.

My installation is like this:

  • Head node with PBSPro-14.1.0 Server installed. (Azure machine size Basic A3)
  • 2 Compute nodes with PBSPro 14.1.0 Execution installed.(Azure machine size Standard A9 with RDMA)

I hope someone can help.

Kind regards,
Kees Kuijt

A few things to check…
Azure provides cloud infiniband driver, is it correctly configured?
To narrow things down, You would better summit manually a script to PBS running a fluent job which would use infiniband to communicate and make sure it works as expected.
And if the above works, you need to check the script itself that Ansys passed to PBS, which could be found in PBS while the job is queued or running on head node and/or the first compute node.
As far as I know there is no specific setting required in PBS to enable infiniband support.

hope it helps.

吴光宇|怀曦智能科技

@kees
Our documentation department would like to know exactly where you found the IB mention in documentation of 13.2.

@kees,

There’s nothing magical about Infiniband from PBS Pro’s point of view. We’re pretty much interconnect-agnostic - as long as we’re allowed to talk TCP/IP over it, we don’t really care what the interconnect is. Infiniband, Ethernet, or tin cans connected by a string (well, maybe not that) are all perfectly okay.

We’re (Altair, that is) a little puzzled by your statement that you found something called pbs_mom_ib in the 13.2 documentation, since:

  1. There is no 13.2 documentation, because there is no 13.2 release (the latest commercial release is 13.1.1), and
  2. There’s no special Infiniband MOM (and the string pbs_mom_ib occurs nowhere in our documentation).

Could you do a “qstat -f” on one of these jobs while it’s running so we can see what the resource request, attributes, and substate of the job might be? A tracejob might be handy, too. Getting a look at the actual job script might prove enlightening.

Hi Altair4,

Of course it should be 13.1. I think I messed up 13.2 of PBS and 17.2 of ANSYS somehow. (Maybe it’s time for a Christmas holiday or something ;-)).

On page 55 of the installation guide of Version 13.1 it is suggested to replace pbs_mom.ib to pbs_mom (par. 3.6.2.2)
It is indeed not pbs_mom_ib (as stated in my initial post).

Kind regards,
Kees

@wgy
I make sure the azure drivers are running by testing them with the following pingpong command:

mpirun -ppn 1 -n 2 -hostfile /home/$USER/nodenames.txt -env I_MPI_FABRICS=dapl -env I_MPI_DAPL_PROVIDER=ofa-v2-ib0 -env I_MPI_DYNAMIC_CONNECTION=0 IMB-MPI1 pingpong

This functions as it should and returns data transfer speeds between 2 nodes. (nodenames.txt contains two machine names).
I totally agree that I should test the PBS scheduler with a script directly from the headnode to see if it works without the Fluent interface.
I’ll test the PBS scheduler by running a dedicated test script (even without Fluent). I’m getting the feeling that it is an ANSYS problem by not passing the correct parameters to the PBS job. So I’ll figure that out as well.

@sgombosi
Output of qstat -f:

Job id            Name             User              Time Use S Queue
----------------  ---------------- ----------------  -------- - -----
3.kees06pvxjbox   fluent           nlass.hpc         00:00:00 R workq
nlass.hpc@kees06pvxjbox:/> qstat -f 3
Job Id: 3.kees06pvxjbox
    Job_Name = fluent
    Job_Owner = nlass.hpc@10.224.57.39
    resources_used.cpupercent = 0
    resources_used.cput = 00:00:00
    resources_used.mem = 0kb
    resources_used.ncpus = 30
    resources_used.vmem = 0kb
    resources_used.walltime = 00:00:00
    job_state = R
    queue = workq
    server = kees06pvxjbox
    Checkpoint = u
    ctime = Mon Dec 19 12:56:03 2016
    Error_Path = 10.224.57.39:/mnt/nfsshare/fluent.e3
    exec_host = kees06pvx000002/0+kees06pvx000002/1+kees06pvx000002/2+kees06pvx
        000002/3+kees06pvx000002/4+kees06pvx000002/5+kees06pvx000002/6+kees06pv
        x000002/7+kees06pvx000002/8+kees06pvx000002/9+kees06pvx000002/10+kees06
        pvx000002/11+kees06pvx000002/12+kees06pvx000002/13+kees06pvx000002/14+k
        ees06pvx000002/15+kees06pvx000003/0+kees06pvx000003/1+kees06pvx000003/2
        +kees06pvx000003/3+kees06pvx000003/4+kees06pvx000003/5+kees06pvx000003/
        6+kees06pvx000003/7+kees06pvx000003/8+kees06pvx000003/9+kees06pvx000003
        /10+kees06pvx000003/11+kees06pvx000003/12+kees06pvx000003/13
    exec_vnode = (kees06pvx000002:ncpus=1)+(kees06pvx000002:ncpus=1)+(kees06pvx
        000002:ncpus=1)+(kees06pvx000002:ncpus=1)+(kees06pvx000002:ncpus=1)+(ke
        es06pvx000002:ncpus=1)+(kees06pvx000002:ncpus=1)+(kees06pvx000002:ncpus
        =1)+(kees06pvx000002:ncpus=1)+(kees06pvx000002:ncpus=1)+(kees06pvx00000
        2:ncpus=1)+(kees06pvx000002:ncpus=1)+(kees06pvx000002:ncpus=1)+(kees06p
        vx000002:ncpus=1)+(kees06pvx000002:ncpus=1)+(kees06pvx000002:ncpus=1)+(
        kees06pvx000003:ncpus=1)+(kees06pvx000003:ncpus=1)+(kees06pvx000003:ncp
        us=1)+(kees06pvx000003:ncpus=1)+(kees06pvx000003:ncpus=1)+(kees06pvx000
        003:ncpus=1)+(kees06pvx000003:ncpus=1)+(kees06pvx000003:ncpus=1)+(kees0
        6pvx000003:ncpus=1)+(kees06pvx000003:ncpus=1)+(kees06pvx000003:ncpus=1)
        +(kees06pvx000003:ncpus=1)+(kees06pvx000003:ncpus=1)+(kees06pvx000003:n
        cpus=1)
    Hold_Types = n
    Join_Path = oe
    Keep_Files = n
    Mail_Points = a
    mtime = Mon Dec 19 12:56:04 2016
    Output_Path = 10.224.57.39:/mnt/nfsshare/fluent.o3
    Priority = 0
    qtime = Mon Dec 19 12:56:03 2016
    Rerunable = True
    Resource_List.ncpus = 30
    Resource_List.nodect = 30
    Resource_List.place = free
    Resource_List.select = 30
    stime = Mon Dec 19 12:56:04 2016
    session_id = 3454
    jobdir = /home/nlass.hpc
    substate = 42
    Variable_List = PBS_O_HOME=/home/nlass.hpc,PBS_O_LANG=en_US.UTF-8,
        PBS_O_LOGNAME=nlass.hpc,
        PBS_O_PATH=/mnt/resource/ansys_fluent/v172/fluent/contrib/lnamd64:/mnt
        /resource/ansys_fluent/v172/fluent/bin:/mnt/resource/ansys_fluent/v172/
        fluent/bin:/mnt/resource/ansys_fluent/v172/fluent/bin:/opt/intel/impi/5
        .0.3.048/bin64:/home/nlass.hpc/bin:/usr/local/bin:/usr/bin:/bin:/usr/ga
        mes:/usr/lib/mit/bin:/opt/pbs/bin,PBS_O_MAIL=/var/mail/nlass.hpc,
        PBS_O_SHELL=/bin/bash,PBS_O_WORKDIR=/mnt/nfsshare,PBS_O_SYSTEM=Linux,
        FLUENT_INC=/mnt/resource/ansys_fluent/v172/fluent,
        DISPLAY=kees06pvxjbox:0.0,LM_PBS_GUI=1,
        LM_PBS_ARGS=-r17.2.0 3ddp -pinfiniband -mpi=intel -node -t30 -mport 10
        .224.57.18:10.224.57.39:49533:0 -g,PBS_O_QUEUE=workq,
        PBS_O_HOST=10.224.57.39
    comment = Job run at Mon Dec 19 at 12:56 on (kees06pvx000002:ncpus=1)+(kees
        06pvx000002:ncpus=1)+(kees06pvx000002:ncpus=1)+(kees06pvx000002:ncpus=1
        )+(kees06pvx000002:ncpus=1)+(kees06pvx000002:ncpus=1)+(kees06pvx000002:
        ncpus=1)+(kees06pvx000002:ncpus=1)+(kees06pvx...
    etime = Mon Dec 19 12:56:03 2016
    run_count = 1
    Submit_arguments = -l select=30 -j oe -v FLUENT_INC=/mnt/resource/ansys_flu
        ent/v172/fluent,DISPLAY=kees06pvxjbox:0.0,LM_PBS_GUI=1,
        LM_PBS_ARGS=-r17.2.0 3ddp -pinfiniband -mpi=intel -node -t30 -mport 10
        .224.57.18:10.224.57.39:49533:0 -g  /mnt/resource/ansys_fluent/v172/flu
        ent/fluent17.2.0/bin/fluent
    project = _pbs_project_default

The output of the tracejob is below (tracejob -n 10 3):

12/19/2016 12:56:03  S    enqueuing into workq, state 1 hop 1
12/19/2016 12:56:04  L    Considering job to run
12/19/2016 12:56:04  S    Job Queued at request of nlass.hpc@10.224.57.39, owner = nlass.hpc@10.224.57.39, job name = fluent, queue = workq
12/19/2016 12:56:04  S    Job Run at request of Scheduler@10.224.57.39 on exec_vnode
                          (kees06pvx000002:ncpus=1)+(kees06pvx000002:ncpus=1)+(kees06pvx000002:ncpus=1)+(kees06pvx000002:ncpus=1)+(kees06pvx000002:ncpus=1)+(kees06pvx000002:ncpus=1)+(kees06pvx000002:ncpus=1)+(kees06pvx000002:ncpus=1)+(kees06pvx000002:ncpus=1)+(kees06pvx000002:ncpus=1)+(kees06pvx000002:ncpus=1)+(kees06pvx000002:ncpus=1)+(kees06pvx000002:ncpus=1)+(kees06pvx000002:ncpus=1)+(kees06pvx000002:ncpus=1)+(kees06pvx000002:ncpus=1)+(kees06pvx000003:ncpus=1)+(kees06pvx000003:ncpus=1)+(kees06pvx000003:ncpus=1)+(kees06pvx000003:ncpus=1)+(kees06pvx000003:ncpus=1)+(kees06pvx000003:ncpus=1)+(kees06pvx000003:ncpus=1)+(kees06pvx000003:ncpus=1)+(kees06pvx000003:ncpus=1)+(kees06pvx000003:ncpus=1)+(kees06pvx000003:ncpus=1)+(kees06pvx000003:ncpus=1)+(kees06pvx000003:ncpus=1)+(kees06pvx000003:ncpus=1)
12/19/2016 12:56:04  S    Job Modified at request of Scheduler@10.224.57.39
12/19/2016 12:56:04  L    Job run

I don’t know were to find jobscript itself. Is it required by PBS? I think ANSYS issues a command to the scheduler which is transformed to a qsub command (point 2 at my intial post). I don’t think there’s a script involved.
The command for qsub is:

 qsub  -l select=30 -j oe -v "FLUENT_INC=/mnt/resource/ansys_fluent/v172/fluent,DISPLAY=kees06pvxjbox:0.0,LM_PBS_GUI=1,LM_PBS_ARGS=-r17.2.0 3ddp -pinfiniband -mpi=intel -node -t30 -mport 10.224.57.18:10.224.57.39:49533:0 -g " /mnt/resource/ansys_fluent/v172/fluent/fluent17.2.0/bin/fluent 
    3.kees06pvxjbox

(port numbers may vary with the command above since this job is different).

Kind regards,
Kees

Hello @kees,

The section of the Install Guide you are referring to is specific to AIX. Are you running AIX with Infiniband?

Thanks,

Mike

@mkaro
Hi Mike,

No, it’s all SUSE 12 Linux based.
The AIX part was the only part I could find about Infiniband

Then as @sgombosi pointed out, there shouldn’t be anything special about the configuration. PBS Pro will use IPv4 over Infiniband to communicate between nodes. Please ensure your Infiniband configuration is setup properly.

Beyond that, you can look at the server and mom logs to see if there are errors or warnings present. If you find any, please post them here.

@kees

  1. Could you please run qsub -l select=30:ncpus=30:mpiprocs=30 -I (capital I for Iceland) , then get the output of the $PBS_NODEFILE

  2. Compare the output of $PBS_NODEFILE against the /etc/hosts entries on the headnode/compute nodes
    Check whether the addresses in the $PBS_NODEFILE are resolvable.

    Note: some times we have to adjust (cut the hostname part without ib suffix , etc to suit the situation) the contents of the $PBS_NODEFILE by creating another nodefile and use that node file to run the job.

  3. export MPI_ROOT="/opt/intel"
    export PROTO=“TCP"
    export MPI_REMSH=”/opt/pbs/default/bin/pbs_tmrsh"
    export P4_RSHCOMMAND=$MPI_REMSH
    export PBS_RSHCOMMAND="/usr/bin/ssh"
    export FLUENT_SSH=$MPI_REMSH
    export MPI_CPU_AFFINITY="ll"
    export PATH=$PATH:$MPI_ROOT
    fluent 3ddp -alnamd64 -r14.0.0 -t32 -mpi=intel -pib -g -mpitest -cnf=$PBS_NODEFILE
    or
    fluent 3ddp -pdefault -cnf=${PBS_NODEFILE} -mpi=intel -g -t${np} -ssh -i ${INPUT}

These are some of the hints to get to know the issue.

" I’m getting the feeling that it is an ANSYS problem by not passing the correct parameters to the PBS job."

I think that’s the problem. In order to use IB on Azure you must set some very specific environment variables, as indicated in your ping pong test. I haven’t attempted Fluent on Azure but I did get STAR-CCM+ working by setting those environment variables.

Hi Guys,

I contacted the some people from Microsoft as well (since it was also related to Azure), and we tried several things to get it started. Below are the findings:

  1. We started to test Infiniband by itself (to make sure IB functions at least). We found that using the SUSE Linux on Azure didn’t work properly. Machines were locking up etc.etc. After using CentOS it works as expected.
  2. We skipped PBS at first and then tried to start Fluent with the correct parameters. This works as well.
  3. Using the Fluent Launcher (without PBS) causes MPI processes to be prematurely ended. The second time you run Fluent via the launcher, causes the compute nodes to lock up.

@spschaller: I think you’re right. The Fluent launcher doesn’t pass the correct parameters and I’m focusing on that now.

But since we are going to use CentOS 7 now, the problem will be installation of PBS on these nodes, since several dependencies cannot be solved. (But that is another topic.)

So the current status:
IB is working without PBS, but PBS is most probably not the problem. ANSYS Fluent (Launcher) is.

When I have results I will post them here.

Thank you for all your (quick) responses.

Kees

Hi,

FYI
I did some testing last couple of days and it seems to be working now.
It indeed was ANSYS Fluent launcher that did not pass the parameter that was required.

When I entered “mpi-auto-selected” in the Interconnect box it worked. I get a speed improvement between 20 and 30%.
Ethernet network remains quiet as it is only used for transmitting data from/to the head node.

Kind regards.
Kees