I’ve updated the design with a proposed sequence diagram as part of cutting down no of initial hello exchange messages. Please review.
Thanks @nithinj the diagram helps a lot. I do think that you should change the text to talk more about the new protocol and IS_ messages than the old one. Right now, the headings in your design are about the old protocol and IS_ messages, I’d suggest that you instead put the new IS_ messages as headings and describe what they do and which old IS_ messages they replace.
added more explanation. I’ve kept the old messages as they are not being removed from the protocol but only from initial dialogue exchange. please have a look.
Hi, sorry for the late reply, someone brought the PR to my attention, and I have design questions:
How will the current IS_HELLO_NO_INVENTORY be handled? Or is every Cray X series login_node mom going to send all of the same compute node information to a server?
The design says “Mom will attempt to connect every 64 seconds in the worst case”
Is this value configurable? How is the interval determined?
For failover the design says “Mom will be treating two of the servers in its list with equal precedence and will be sending hello randomly to one of them. It can receive a reply back only from the active server.”
How is the randomness determined? What if the MoM randomly keeps sending the hello to the non-active server? She will never get connected to the active server?
With Shasta and Kubernetes, I’m not sure what the incoming request from the MoM will look like to the server. All Moms might actually look like they are coming from “gateway” instead of the compute node. And on Shasta, each compute node will host an individual MoM, so it is important that each mom be recognized by the server.
Thanks, Lisa for the response. Please find my answer below:
The server will elect only one inventory reporting mom using the present logic. Whether the server needs inventory or not is expressed using an integer field within the IS_REPLYHELLO message.
The mom will start with shorter interval. Initially mom will attempt to reach the server every 1 sec. After a couple of attempts it will switch to a longer interval, double as that of the first try, 2 sec. It will exponentaily increase that until it reaches 64 seconds.
The pattern looks like: 1sec * 2 followed by 2sec * 4, 4sec * 8 and so on…
This is algorithmically determined and not configurable at the moment. However HUPing or restarting a mom will put this back to the initial state.
We use rand() function with modulo on number of servers available. rand() giving skewed results can waste a couple of attempts. But this should be okay as the initial bursts are quicker.
Today, server will use mom’s IP address to reach out to mom. The proposed mechanism does not make any further assumptions.
Except in the future with mult-server server will have to do reverse-DNS lookup to identify mom’s hostname from the incoming IP address. So it’s good if we can find this.
Thanks for the responses @nithinj. Sounds good to me.