Jobid namespace resolution for multi-server

Hi,

I wanted to start a discussion about the jobid namespace problem that comes along with multi-server and how we should solve it. We have broadly two directions to go in:

  1. keep the strictly sequential nature of jobids that exists today in PBS, i.e - if you submit 100 jobs, you’ll get jobids from 0.server_name to 99.server_name, in increasing order.
  2. Sacrifice the strictly sequential nature of jobids, so if one so if you submit 100 jobs, you are not guaranteed to get job ids from 0.server_name to 100.server_name, they can be anything.

I thought of a few approaches that we can choose for each option:
Approaches which sacrifice sequential jobids:
a) Add a prefix to job ids of each server, something like: jobs starting with id 0 belong to server 0, 1 to server 1 and so on … the con is obviously that we can only have 10 multi-servers max, but maybe 10 is large enough? another con is that jobids won’t be sequential. pro is that this is really easy to implement and understand and will not add any performance degradation
b) Job id ranges: here we will ask admins to configure job id ranges for each server. So, server 1 will get ids from 0 to 1 million, server 2 will gets 1 to 2 million and so on. pro is that this doesn’t add any performance degradation, also easy to implement, but suffers from the problem of jobids not being sequential. Also, I’m not clear on what we will do when the ranges run out.
c) Servers generate ids with an offset: if there are 3 servers then server 1 generates ids 0, 3, 6 etc., server 2 generates ids 1, 4, 7 etc. and server 3 generates 2, 5, 8 etc. This has the same advantages of the first two approaches, plus the jobids generated by each server are closer to each other. It does have the disadvantage that the logic is highly dependent on the number of servers, so it might involve a complex remapping of ids to servers if the number of servers is modified.

Approaches which preserve sequential jobids:
a) Have a job id generating service: this one will be a single point of contact for all servers to get unique, sequential job ids. It will allow job ids to still be sequential. Cons: possible degradation in performance as it serializes job id creation, and it will also mean that clients (except scheduler) will have to contact this service to know which server a jobid maps to, again causing loss in performance, but maybe not that much. Also, we will have to think about when to clean up its cache of jobid:server as jobs leave the system.
b) Use a “load balancer” which will take “all” requests and route them to the different servers depending on which server they should go to. For qsub requests, this service can generate a unique jobid, and choose one of the servers to send it to. Instead of storing a huge map of jobids to server like we have to in bullet a), we can use a technique like consistent hashing to decide which server a jobid should go to so we might not need to maintain a cache if we do this. Con again is possible loss in performance because we’ve introduced a single entity through which all client calls go (scheduler should still be able to contact each server directly, so this would apply to front-end clients only)

Can you guys think of other approaches?

I think we need to first decide if sequential id generation is important and then discuss which approach is better for a balance between performance and ease of use in case of change in number of servers. Please provide feedback!

Update:
Design in review: https://openpbs.atlassian.net/wiki/spaces/PD/pages/2320924673/jobid+namespace+redesign+for+multi-server

Another approach for serial IDs similar to what you propose in option (a) is to have PostgreSQL do the incrementing for you when the record is entered into the DB. Check out the serial and bigserial types. One possible way to do this would be to provide a locally unique job ID and the server ID (either name or hash). Clients like qstat would display the globally unique job ID, and qstat -f could show the local IDs as well. Each server only cares about it’s local IDs.

Thanks @mkaro for the inputs. We unfortunately cannot use the db to do this since the servers in multi-server won’t share a common db.

Three thoughts:

(1) Will PBS multi-server support running a single server? In this case, will the job IDs be serialized? If yes and yes, then that ticks the “backward compatibility” box, and, in theory, PBS can do anything (new) with job IDs when more than one server is configured.

(2) What about making the job ID generating service optional? If you need it, you get it, but incur a performance penalty; if you leave it off, PBS runs faster. Plus, an MVP version of multi-server probably doesn’t need the service – it could be added later if needed – especially if (1) is yes/yes above.

(3) Generally, embedding semantic information in names is not great design. So, if the target server can be kept separate from the ID, that’s ideal… unless it’s slow or complicated, then … less ideal :-).

Thx!

That’s correct. So I guess we don’t need to worry about keeping them sequential for the multi-server case.

I like that, ya we could add it later for sites which need the ids to be sequential and willing to accept the possible performance slowdown that comes with it.

Ok, I’ve listed some approaches where the jobid doesn’t embed target server information, with no additional overhead, so we can rank those higher.

Thanks for your inputs!

Thoughts:

  • I like the idea where we get sequential ids with single server and do not get sequential ids with multi-server. Sites that actually need multiple servers will have to forego some comfort (that is the balance that good architecture needs to provide)

  • The only way to get sequential ids is to go to a central service, which has these problems

    • possibly slower (not just network round trip, but a single point of conflict resolution)
    • maintain another daemon code
    • Worry about robustness of service (failover etc)
  • We could use the current server index to generate unique jobids (won’t be sequential) and YET not consciously “embed” the knowledge of the owning server in the id. In other words, use the server index only to get a unique id, but not use that id for any other thing. So i like approach (a) (of the non-sequential list) - we can go beyond 10 servers in this as well if we use some bits (instead of digits)?

  • ranges are bad: Some sites already provided feedback that they won’t like job ids suddenly to come from half a trillion onwards

Ok, yes we could decide on a fixed number of bits which would represent the server id, that would get rid of the shortcoming of approach a). @billnitzberg what do you think about this?

That said, i don’t think we will need more than 10 server instances in the next year or so, so why don’t we start with the simplest approach of just keeping one digit for the servers - the format of the ID should be internal (and opaque), and we need to explicitly state that the ID should not be interpreted in any way. Thus, it will be possible to enhance it in future without breaking clients

1 Like

Sounds good to me, I’ll wait a day or two to see if anyone has any other thoughts, otherwise will go with this.

I’m not sure having a 0 prefix is a good idea, it could cause problems when it’s converted to an int.

I’m not sure I understand how it might cause any issues, can you please explain more?

Well if you have a server assigned with prefix 0, and the id that comes out is 0123, if at any point the job id gets turned to an int and then printed again as a string, it’ll be 123.

That should be alright unless we want to identify the owner instance of the job from jobid.

Instead of prefixing, we could reserve the least significant digit for the server instance which will avoid lengthy jobids.

well that is what we want to do if we go with approach a)

ok, i see your concern, ya it can be error prone, so we can limit it from 1-9 for now and come back to it when we want to scale beyond it. I’ll try to make code generic enough so that we can easily change the number of bits used to identify the server.

I think it was Bill Gates who said nobody would ever need more than 10x the 64k memory in today’s personal computers… so, we can make the Windows limit 640k max.

If you are going to put a limit on it… I’d suggest starting with 100 (or 90 if you want to skip the leading 0)… It might be nice to use more than 10 servers for really big sites … and for benchmarking. PBS could also use letters (but that might be too much disruption).

Thanks for the feedback. Ok, let’s go with 100 as the limit then. Since the leading 0 is coming in the way and will make the numbers a little unintuitive (server 1 will be ‘10’, 2 will be ‘11’ and so on), it might be better to use @nithinj’s idea of reserving the LSB instead of MSB. So, we can reserve the last 2 digits of the numeric part of a jobid to represent the server id going from 00 to 99. Let me know what you guys think.

I was suggesting the ending digits (or bits) as well…

@billnitzberg what i was suggesting was not to cap anything in the design, so yes, we should not close the doors by limiting to 10 servers, but the initial implementation that only supports 10 would be good enough to test things out - that is likely to lead us to find many other bottlenecks anyway. Or we can spend a lot of time making this perfect and then figure out that we do not scale well more than 3-4 servers anyway :slight_smile: So - my point was to go one (baby) step at a time, yet keep the design open to changes

Thanks for all the useful feedback guys, I’ve created a design document for the approach which we were converging on, please provide feedback:

Thanks!

I only have one comment (sorry for coming late). How does the 2 extra digits affect the max size of the job ids? We have an attribute which describes the largest job id we can have. How does these two things interact? Does this mean we less max job ids unless we have all 100 servers?

It isn’t really a big deal. The admins just have to bump the max job id number.

Does this affect the formatting of qstat when job ids get large?

Bhroam

If the job retains the same jobid as it moves within the multi-server cluster, IFL could not rely on the jobid to reach the right server. IMO the major benefit of this enhancement is more about keeping job ids unique across mutli-server cluster than reaching the right server. Please clarify this in the design.

If you can handle all the additional complexities within IFL, the client program can work with multi-server the same way it works with a single server as it uses IFL to talk to the server. The design page gives the feeling that client programs should be aware of the job id semantics to work with multi-server. Could you refine that?

Please add the page along with other pages in the multi-server tree: