PP-662, PP-663: UCR and External Interface document for Reservation enhancements

Hi All,

I have posted UCR and Proposed Design for the upcoming reservation enhancements. Request all to provide feedback on the same.

Thanks,
Prakash

Hi Prakash ,
Please find my comment/feedback

General:-

  1. It would good if we can map the use cases with the requirements and interfaces.

UCR:-

  1. Requirement given in section C , are we going to work on this as part of this RFE ?

  2. For requirement B.1.a

    1. is point i and ii both should be true
      or
      One of the point should be true either point i or ii

External Design:-
Interface #1

  1. Is Admin and non-admin user both has access to pbs_ralter command? or it similar as pbs_rsub?
  2. Will ctime and mtime will also get updated for reservation after using pbs_ralter ?

1.4.c.This command can be used to change an advance reservation or the next/current instance of a standing reservation.
3)Do you mean we can alter the time of advance reservation or recurrence_rule argument or both ?

1.4.d. After the change is requested, the change is either confirmed or denied.
4)In both the case if change is confirmed or denied , do we get any specific message on console?

1.4.g.i.2 This option can be used only when the reservation is not running or is empty
5)Is this either one of the case should true or both condition should be true
I mean is this
Reservation is not running AND Reservation is Empty.
or
Reservation is not running OR Reservation is empty.

Interface #6
4.Details - When pbs_ralter is used for altering its start time that has jobs in it and has already started running, this new error code is returned.
6)Will it displayed on the console or logged in pbs_daemon log eg.server_log

Interface #7
4.Details - When pbs_ralter is used for altering its start time that has jobs in it and has already started running, “Reservation not empty” is displayed on the console.
7)Can we make it “Reservation < resvID > not empty” instead of “Reservation not empty” ? Just a suggestion .

Interface #10 and Interface #11
8)If a job is running in reservation and reservation alter request come for end_time , will this new Accounting log reflect in tracejob < job id > output as well

Also please let me know if any clarification is needed for the above query.

Regards,
Lovely

Hi Lovely,

Yes, I would, once @smgoosen signs-off the UCR.

These are two events when changing the start time of the reservation will be allowed. I have changed the phrasing to avoid any confusion. I hope it will be clear this time.

A question for @smgoosen.

We can change only the times of the reservation as of now. recurrence rule is not there in the requirements.

Yes, if we use the interactive mode.

Its a OR.

No. But, would like to get @smgoosen’s view on this.

When the user uses pbs_ralter to change the start time, he/she has to provide the resvID, so I believe we need not display the resvID.

No.

Thanks,
Prakash

Nice work on the design doc, @prakashcv13. I just have some comments.

  • I think interface 4.e is a bit more verbose than it needs to be. I’d rather you just say “left as is” rather than “not deleted and left as is”. To me, if you make a request to alter something and the request is rejected, It would be left unchanged.
  • Remove interface 4.f. You’re making a future reference to something you haven’t described yet, and 4.g.iii says the same thing as you describe what the interactive option does.
  • Interface 4.g.i: I don’t understand why the reservation having any job in it would stop a alter request. Wait, when you change the start of the reservation, does the length of the reservation change? Let’s say I have a reservation of 1hr duration starting at 2200 and going till 2300. If I use pbs_ralter to change the start time to 2230, do I suddenly have a 30m reservation? If so, this isn’t clear. You should make that explicit. If it is true, you should provide a way to shift a reservation in time without shortening (or lengthening) it.
  • in interface 4.g.ii.3 you say if the change is allowed. In 4.g.iii.2 you use the word confirmed. You should probably be consistent. I like the word confirmed myself.
  • Merge interfaces 6 and 7. They are really the same. The 15XXX numbers specifically map to string values. It’s really one interface.
  • Merge interfaces 8 and 9. Say the reservation state is RESV_BEING_ALTERED and it is displayed in pbs_rstat as ‘AL’. AL itself isn’t really an interface by itself, it’s just a shortening of the real state.
  • I don’t find the ‘A’ accounting record useful as is. It’s saying there was a request to change something and not what that request was. Where will I see what has changed in the reservation? Is there a new confirmation record? I especially don’t think the number of interactive seconds is useful at all in the accounting log.
  • I don’t think the log messages in interfaces 12, 13, and 14 should be stable. To me, they useful enough to be stable. If they are stable, that means we need to test that they are there. If you require on them for automated testing, then go ahead and make them stable.
  • interfaces 12, 13, and 14: Don’t say "for ". Use the ID portion of the log field. This way they’ll show up in tracejob.
  • I’d merge interfaces 13 and 14 into one message that says “Reservation alter <confirmed/denied>”
  • In interface 14 you say for a noninteractive request. Is this true for the other two messages? Why does this only print for a noninteractive request?

done.

done

This is only for changing the start time and it does makes sense to me. If the reservation has started, there are chances that some of the jobs in it have started running.

If the user changes only the start time OR end time, the reservation duration is bound to change and to me it is intrinsic to this RFE, so I didnt make it explicit in the design, but the confusion may arise, so I have updated the design with this detail.

By changing both the start and end times.

changed “allowed” to “confirmed” in 4.g.ii.3.

done.

done.

Would you like to add information on what change was requested in this log, or are you suggesting that we do not need this accounting log at all.

Could you please elaborate on why do you think they are not useful enough.

“for” was there only in interface 14 (now 12). removed it.

Corrected.

@prakashcv13 thanks for making the changes to the document.

I agree that when a reservation has started, changing the start time has no meaning. I just don’t understand why you can’t change the start time if the reservation has jobs in it before the reservation starts.

I actually view this the other way. The reservation’s duration should remain constant unless the user goes out of their way to only change it. The user made a decision on how long they wanted the reservation when they originally made it. I look at the use of this command as shifting the reservation. I think it will be confusing to the user that they’ve changed the duration. You also have the issue of what happens when start time is changed past the end time? If you shift the reservation, this is not an issue. If you only change the start time, you’ll have to worry about this edge case.

I do find them useful and think they should be printed. I just think the vast majority of log messages should not be stable. Are they required for automated testing? If so, make them stable. If not, making them stable will require you to test for them. Making them stable comes with extra baggage if the message ever needs to be changed. It requires the message to be deprecated for at least one year. Do you think the messages are stable enough to go through that sort of process?

We can, the condition to allow the start time change is that the reservation should not be running or should be empty.

The main use case of this RFE is to change the end time of the reservation to allow a job to continue running past the reservation’s original run time. If you see, in this use case there is no possibility of the duration to remain same.

These are simple notifications about the reservation, I believe once the Design is signed-off they wont change, hence they are stable. Please do let me know if you think that these messages might change.

I’m missing something. It’s probably a question of terminology. Are you saying a reservation’s start time can change if:
A) it hasn’t started regardless of number of jobs in the reservation
B) it is empty, regardless if it has started or not
C) it hasn’t started and it is empty

That makes it sound that you can change the start time of a reservation if the reservation has started and is empty. What does it mean if you change the start time to the future? Does the reservation go back into the confirmed state?

I see now. There are basically three options. The first is to shift. The second is to force the user to provide both a start and an end (or start/duration). The thrid is like you have it, to change the duration. If the main use case is to extend running reservations, then the first doesn’t work. You can’t really shift a running reservation. The second doesn’t work because you can’t change the start time of a running reservation. The third way is really the only solution. You’ll need to check the case where someone tries to alter the start time after the end time. This will require an error message. Please add that to the document.

Just because you think they’ll never change isn’t a good enough reason to make an interface stable. You need to have a real reason to make an interface stable. Stable interfaces are supported. Stable interfaces are documented. Stable interfaces are tested. There is a lot of work in making an interface stable. It is possible that an unstable interface can not change. It’s better to provide the flexibility that we can change it in the future vs not having that ability.

Bhroam

Right.

A reconfirm request would be sent to the scheduler, if it gets confirmed, we change the state to ‘CO’.

This check is already there in start_end_dur_wall() and the existing error message “Bad time specification(s)” will be displayed. Please let me know if you would like me to add this error message as an interface to the document?

Based on your comments, it seems to me that I am not aware of what makes an interface stable. Could you please point me to a document or a specification where I can understand when to make an interface stable or otherwise. Once it is clear to me, I would be in a position to get back to you on this.

Thanks,
Prakash

One interesting side effect of this is that you could shrink a reservation with a job to the point that the job could never finish in time. I guess if the admin wants to shoot themselves in the foot that way, we should let them.

If it is an existing message, I don’t think you need to add it to the document.[quote=“prakashcv13, post:9, topic:535”]
Based on your comments, it seems to me that I am not aware of what makes an interface stable. Could you please point me to a document or a specification where I can understand when to make an interface stable or otherwise. Once it is clear to me, I would be in a position to get back to you on this.
[/quote]
The document you are asking for does not exist. We’ve always gone on the assumption that if an engineer wants to make an interface stable, they have a reason to do so. An exampe is that you think that a consumer of the interface (e.g., the admins or support or QA) will be relying on the interface to do their job. Stable interfaces requires testing. Do you want to write extra automated tests to make sure your new interfaces work? Stable interfaces require support. The less interfaces we need to support will make life easier on us.

It comes down the question: why do you want to make this interface stable? If you have a reason, then make it stable. If you just think the interface is unlikely to not change, then make it unstable.

Bhroam

Thank you for this explanation. I believe that the admin can rely on these three new log messages in order to track down what happened to the reservation alter request, so I would like them to be stable and will write automation tests accordingly at the time of implementation. Please let me know if you do not agree.

Thanks,
Prakash

OK, if you feel the admin requires these messages, then make them stable. I personally leave most of my log messages unstable. I see your point here though. A lot of engineers make all messages stable without much thought. I was just trying to make sure you were sure you wanted these messages stable.

Thanks,
Bhroam

Ok :). Should I consider that as a sign-off from you for the External Interface Design?

Thanks,
Prakash

Oh yes, sorry about that. I sign off.

Looking nice. Two suggestions:

  • In addition to the CLI, if there are corresponding changes in the PBS API to support this new capability, please add those to the design as well. (I feel everything possible via CLI should also be possible via the PBS API.)

  • Regarding accounting records: we definitely want the accounting record to capture the changes made to the reservation. (I’m neutral whether we also want to capture a request, when that request is denied – I actually don’t think capturing a denied request is required, but it’s also OK to include it). In any case, please include all the details of what is put into the accounting field, both when a request is made and when the request is confirmed, and it would be really helpful to also include an example.

Thanks!

Hi @billnitzberg,

I have added information on the new PBS API that will be introduced along with providing examples on the new accounting logs. I felt that a new accounting log for denial will also be needed, so added that as well.

Please let me know if you find them useful and further comments/suggestions.

Thanks,
Prakash

Hi @smgoosen,

I have updated the UCR as per the offline comments. Request you to review and provide your comments.

Thanks,
Prakash

PS: Offline comments from @smgoosen -

Hello Prakash,

One change I’d suggest:

If extending the reservation would overlap with another reservation the request should be denied.
b. In order to extend the current reservation it may be necessary for the admin to manually push back the start time of a following reservation.

The new response to this problem from current customers is to try to find different nodes if the current nodes cannot satisfy a request (e.g. another reservation is in the way) rather than just fail. It is acceptable if this type of request/action be possible only before the reservation starts. Once it starts then the current behavior (denial and as per b. requiring manual intervention) is acceptable.

Also note that requirements 2a. and 4. Conflict with each other. Item 4. Is the (new) desired behavior.

Thanks,
Sam Goosen

Hi Prakash,

It doesn’t seem clear when item 5. should be applied and when not. For example, it is clear that for 3a. the scheduler should look for a new set of vnodes if the reservation cannot be extended on the current set of vnodes. It is not clear whether 5. would apply to 1b. (it should). An example of when we wouldn’t look for new vnodes would be 2a. I think rather than having a separate item 5. we should just be explicit about each case and whether new vnodes are an option.

Thanks,
Sam Goosen

Hi Sam,

Thank you! Yes, I agree that we should be explicit about when would PBS look for new nodes. I have added 2.b.i and removed 5. Hope this will be more clearer now.

Thanks,
Prakash

The Use Case and Requirements look good! I don’t believe there is any need to pull in anything from section C at this time.