First, it is actually worse than you think. I said hex and I really meant alphanumeric. I believe there is considerable merit in not limiting ourselves to digits, but clearly this is arguable either way. The reason I don’t want to limit it to digits is the representation space. If you limit the number of “digits” identifying a server to 2 (using your proposal as an example) you can represent 100 servers with digits only, 256 with hex, 36^2 or 1296 if you use digits and one case, and 62^2 or 3844 if you use digits, uppercase and lowercase.
I think where we are differing is in our assessment of:
- the number of servers we want to plan for
- the perceived impact of moving away from just digits
- perhaps the time horizon we are designing for
Time horizon: I figure you get to make a change like this once every 20 years or so, so I am trying to choose something that will work for 20 years. Someone already mentioned Bill Gates and the 640KB of RAM scenario. How painful was that? And… maybe I am showing my age
Number of servers: I envision a scenario where servers are dynamic. They come and go. If you take the scenario to its logical extreme, there could be a server for every job. That certainly means there would be more than 100. Maybe more than 100 running simultaneously. Maybe that will happen, maybe it won’t, but it certainly could (BTW, the Flux scheduler folks at LLNL have the same idea). I just want to make sure we don’t limit ourselves in that respect. If you are willing to designate a 64 bit number to represent the number of servers, then I am fine with it being numeric.
Oh, something else just occurred to me that probably was not obvious. I want this to be unique for all time. Every time a server starts, it gets a new ID and that ID can never have been used before in the history of that facility. If you use one of the standard UUID algorithms it should be globally unique, but I dont think that is necessarily required.
I will acknowledge that how we generate that unique ID will be a discussion in and of itself. There are a LOT of alternatives all with pros and cons.
The impact of going to alphanumeric: Yes, some scripts might break. However, since there are no guarantees about being consecutive, other users can cut in, etc., doing math on it really doesn’t make much sense, so it should be, and probably is, being treated as a string. Other than “looking weird” to humans, the biggest impact would probably be in databases. If they are storing this data in a database, they might have it stored as an int, or if it is a string, they might have to make the max string length longer. Asking them to do that once every 20 years to get a massive scalability boost seems reasonable to me, but certainly that point can be argued. Oh, and we could choose to do what git does and only display n digits as long as it is unique to cut down on the “looking weird” aspect.
Using the server name in the ID as a heuristic for contacting the right server was mentioned above. I feel a little bit dirty saying this, since I am breaking my own rule about not encoding data in an ID, but the alphanumeric portion (in my proposal) basically is a unique identifier for the server. When it comes up and generates that unique string, it could broadcast that to the other servers and you now have a lookup table that you could use for that heuristic. Basically a routing table.
I scanned back through the thread and I think I responded to all the points that were brought up. I may or may not have convinced you of course If I missed something or something is not clear, please let me know.