Creating reservation out of a job

Hi All,

I have posted a design for creating reservation out of a job. This is part of the user reliability workflow project.

I request the community to go through the design and provide feedback.

Thanks,
Prakash

A couple questions for clarification…

  • What will the new reservation be named? If the job ID was 123.foo, will the reservation be named R123.foo?
  • How will this work for jobs that have been peered to another complex?
  • Will a new reservation queue be created? Will the running job be moved from whatever queue it had been assigned to the new reservation queue?
  • What happens if a user requests a reservation be created for a suspended or checkpointed job?
  • What will be the response if someone tries to create a reservation from an array job?

The last sentence of the overview is the most important, and should probably moved to the front of the paragraph.

Thanks,

Mike

Hey Mike,
Thank you for looking at the design and providing feedback -

The new reservation will be named R<next_available_id>. If the job id is 123 and is run immediately when submitted, the reservation might get named R124.
I have added this detail to the design.

Yes, now updated in the design.

I missed out on this - we are not allowing the user to create reservation out of an array job, but did not specify the error that will be displayed.
I have added Interface 5 for the error message

In all 3 scenarios - the reservation will be created in the pulling complex.

Done

I would like to restrict the option of creating reservation out of a job to jobs only in state R/42, updated the design with this detail.

Thanks for the updates @prakashcv13. A couple more items…

I think you should change “pbs_rsub: Reservation cannot be created from a reservation job” to “pbs_rsub: Reservation may not be created from a job already within a reservation”. I would caution your use of “can” and “may”. If something is impossible, it “cannot” be done. If a restriction has been imposed it “may not” be done.

What type of attribute is create_resv_from? Is it numeric, string, etc.?

1 Like

thank you so much @mkaro for the review -

Noted, and updated the design. Both the error messages (related to array job and reservation job) are updated.

boolean - updated the design with this detail.

Looks good, thank you.

1 Like

Hey @prakashcv13
I only have two comments. First is what is the permissions on create_resv_from? If it is for normal users, then it might be an end run around the reservation ACLs. If it is for normal users, you should check if the user is in the reservation ACLs and reject the job if not.
My other comment is that you might change create_resv_from to create_resv_from_job. The existing attribute seems to end all of a sudden.

Bhroam

Hi @prakashcv13

I have a few comments -

  • What happens if a job is submitted into a reservation queue with “create_resv_from” attribute set to 1? I assume server will reject it but you may want to add that to design.
  • Walltime of the reservation will be the walltime left out of the running job or the full walltime the job was submitted for? I assume it is going to be the walltime left on the running job but it will help if you mention that.
  • Can jobs be altered to set “create_resv_from” attribute while they are running?
  • I am with @bhroam on renaming the attribute name from “create_resv_from” to “create_resv_from_job”.
  • What error is thrown when the job and the reservation’s user do not match?
  • The design document currently mentions “Interface 4” twice, please change that.

Thanks
Arun

Hey @arungrover, @bhroam -

Thank you for the review and feedback.

Right, updated the design with this information.

Done.

done.

Existing functionality - if one user is submitting a job to another’ reservation - it will be reported as an Unauthorized request
other way round - new functionality - creating a reservation out of another user’s job will also display the same message - updated the design with this information.

Now, this is something I did not think of. @scc, I am with @arungrover on this, could you please let us know what according to you would be the right thing to do?

Yes, the end time of the created reservation should match the end time of the job it is created out of.

1 Like

Thank you @scc, @arungrover, and @bhroam - I have updated the design as per your comments, request another round of review.

Hey Prakash,
I just have a couple minor comments

  1. In the example in interface 6, what is ‘qr’ ? Is it an alias of pbs_rstat?
  2. Maybe rephrase the error message in interface 6 to say ‘job in a reservation’ instead of ‘reservation job’. When I think about the term reservation job, I think about this feature.

Bhroam

Thanks @prakashcv13 for addressing the comments. I have a few questions about the implementation though and it would be nice if you could add some of the internal design details to the document as well.

  • I saw the example you have listed in “hook demo.txt” file and saw that the job out of which the reservation was made was a job without a walltime. I think we should not allow such jobs to have a reservations created out of them. The reason being it is hard to decide the duration of the reservation.
  • There could be jobs submitted with soft walltime and in those case reservations should only use hard walltime of the job to decide its duration.
  • Currently we do not allow running jobs to move queues, How are you planning to move the running job into reservation queue in this case? Would it dequeue the job from its original queue and then enqueue the job into the reservation queue? If so, how will it affect limits? wouldn’t it open a way to game the system for users to move their running jobs into reservations and not get affected by run limits?
  • You might want to change Interface 7. It shows a qsub error for pbs_rsub command.

Thanks,
Arun

Hey @bhroam,

yes :), changed it to pbs_rstat.

done.

Thanks,
Prakash

Hey @arungrover,

Not difficult, the reservation will be for 5 years.

done.

good point to be mentioned in the design. done.

Yes, dequeue-ing and enqueue-ing is what I was thinking of. User will not be able to go around the limits if they are applied at the server/complex level.
But, if applied at the queue level, they would be able to.

Thanks

I don’t think using the default scheduler duration is the right answer here. I think you should deny converting a job into a reservation unless it has a walltime. If we want to come back to this in the future, we’ll have more flexibility of how we want to implement it.

This will also screw with defaults. Any resource that was gained from a default will be lost when it is dequeued from the original queue. The resources_max on the reservation queue might replace it though. This should be tested.

Bhroam

Hey @prakashcv13
I just had a thought. Does the rsub hook get called when you convert a job into a reservation? I think it should. We are creating a new reservation.

Bhroam

I believe we actually should create the reservation from a job with no walltime request and have it be for a 5 year duration. The intent is that this will play nicely with the other feature you are working on: Deleting idle reservations.

Hi @bhroam, @arungrover, @scc,

Apologies for a delay in replying to this.

yes, we do.

I thought about it. Below is my take on it -

  1. for limits - the user can even now get passed them - by moving the job to a reservation.
  2. for default resources - this is a documented behaviour - we should document that the default resources might get lost OR - would you suggest using the same default resources from the queue to which the job is submitted while creating a reservation queue?

Thanks,
Prakash

The reservation will be created with the resources the job requested. A resources_max will be set on the reservation queue. Jobs pick this up as a default if there isn’t a default set. My concern might not be warranted. Could you test this? Submit a job into a queue with resources_default.foo = 5. Convert that job into a reservation. Check to see if it still has a request of foo.

Bhroam