When a job is runnning on a node, how can I transit it to another node?
Hope for an answer.
Be deeply greatful.
1 Like
The running jobs (job is in R state) on a compute node cannot be moved to another node.
If that node running that job gets disconnected from the PBS Server, then the job will be automatically requeued on to another node by PBS Server after 310seconds. The server attribute is node_fail_requeue.
Reference: RG-290, PBS Professional 2022.1 Reference Guide
But how can the PBS Server know a job is rerunable or not?
How does it judge?
So, whether the job can be rerun is entirely determined by the job submitter or user.
Yes, by default all jobs are rerunable ( except interactive jobs).
If the user does not want it to be reruannable, the user can opt out using -r n wihle submitting the job using qsub.