RunJob API will accept destination server identifier in extend parameter

bhroam · November 19, 2020, 8:16pm

The reply is not sent before the job is sent to the mom. It is sent after the mom replies back. It allows the job to be rejected by the execjob_begin hook.

How are you going to handle this because the server sending the job to the mom is not the one that received the runjob batch request.

Bhroam

nithinj · November 19, 2020, 8:53pm

The source server which has received the runjob request will send the requests to dest server and proceed with other tasks. The dest server will reply based on the nature of the request (before or after) being sent to mom. Upon receiving the response the final reply will be sent to the client.

agrawalravi90 · November 20, 2020, 3:08am

@nithinj what happens if the destination server rejects the move+run request? does the job remain on source server? can there be a race condition where scheduler stats the servers and both servers report the same job?

agrawalravi90 · November 20, 2020, 3:23am

or one where a different client stats the servers and sees 2 copies of a job?

nithinj · November 23, 2020, 7:07pm

Source server will mark the job in ‘T’ - transit state at the beginning of the operation. The job will be dequeued when it gets a successful response from the destination. So there can be a case where both source and destination server reports the same job, but the source server will be reporting it in T state.

The scheduler will not see this as it will make sure that all servers are in a consistent state before querying the universe.

agrawalravi90 · November 25, 2020, 7:02pm

Ok, thanks for clarifying. I think we might want to avoid this by delaying response to the clients until the move and run is complete. It’s a rare event so it shouldn’t affect the clients that much. Many front end tools might flip out if they see 2 copies of the same job, so I think it would be better to avoid it if possible.

nithinj · November 30, 2020, 5:22pm

Delaying responses will give inconsistent data to the client. When a server is ready to serve the request other servers may not be ready. That will lead to a situation where different server instances are replying at different points in times leading to inconsistent data. We still can achieve the same by pbs_server_ready like protocol which is used between scheduler and server to check all servers are ready to server request.

pbs_server_ready
This API can be published which can be used by clients to get a consistent response. This API will block the client until all servers are ready for response.
The issue with the approach is that the client will get a reply only after all the inter-server operations are finished. These requests will also get piled up in the server making it irresponsive as it also may have to respond to the scheduler requests once the inter-server operations are finished.
Let IFL remove if there is a duplicate
The client might have to go through all jobids to find the id in T state. Server should indicate move+run happening using some bits to avoid doing this all the time. I’m hoping the response would be still faster than waiting. But with this, we are inventing another way to handle the same issue.
Let servers send the job ids. Mark the state appropriately as in Transit.
So even if the client receives two jobids and they are the same, the client can distinguish between them based on the state.

I am inclined to continue with what we have #3 until we hear more on this. We can publish pbs_server_ready and integrate with more clients if that is required. Let me know what do you think.

nithinj · February 22, 2021, 9:05am

With PR: Multi-Server: MN jobs crossing local server boundaries by nithinj · Pull Request #2253 · openpbs/openpbs · GitHub the server will keep a minimal cache of nodes from other servers. I am going to remove this design page as the server knows where to send the job with the help of this cache and doesn’t have to rely on the on_server field. Let me know if you have any concerns.

agrawalravi90 · February 22, 2021, 4:42pm

Don’t delete it, just mark it “obsolete” or something similar. That way, if by any chance we need to go back to the old design or refer to it, it’ll still be there.

Topic		Replies	Views
"Direct write is requested for job:$job_id but the destination: $final_destination_directory is not usecp-able from $mom_hostname" (DEBUG3)" Users/Site Administrators	3	988	February 22, 2022
Runjob hook modify select problem Users/Site Administrators	9	3696	October 3, 2016
Execjob_prologue hook quesetion Users/Site Administrators	7	685	August 30, 2019
Qsub BAD UID for execution Users/Site Administrators	6	13906	November 29, 2016
Unable to submit the job on compute node(node01) Users/Site Administrators	18	4761	March 18, 2022

RunJob API will accept destination server identifier in extend parameter

Related Topics