PP-864:Support suspend/resume on Cray X* series

Hi,

There is a small design proposal posted in Confluence that talks about the external interface changes in PBS when it starts supporting suspend/resume on Cray X series systems (PP-864).

Please have a look at the document and provide comments/feedback.

Thanks,
Arun

@arungrover, the qsig error message should show the error code “15219” shouldn’t it? You might want it consistent with the qsub error message format, for example:
qsub: cannot connect to server bogus.pbspro.com (errno=15008)
When logging a false ‘EMPTY’ response do we really need to log it at PBSEVENT_DEBUG2 instead of a higher debug level?

Thanks for reviewing the document @vccardenas.
The error message is consistent with other qsig error messages like if you try to resume a running job qsig spits out -
qsig: Request invalid for state of job
while this is the error the command shows the error number actually is 15016.

qsub on the other hand has a different way of handling error messages. In some cases it prints error number like you mentioned in your example and in some it doesn’t like this -

qsub -lselect=2:ncpus=2:place=scatter – /bin/sleep 1000
qsub: Resource invalid in “select” specification: place

EMPTY log message is getting logged because of a bug that Cray has in dealing with switching reservations. I felt logging it at DEBUG2 will make PBS log it by default whenever we hit this bug. It will be good for tacking purpose. But, if you still think it should be logged at a higher debug level then I can make the change.

I just noticed that one thing that is missing in the document is that on Cray we need to set restrict_res_to_release_on_suspend (RRTROS) to ncpus by default. I’ll add that to the document.

Thanks!

I agree with @arungrover about the “EMPTY” log message being logged at DEBUG2. We want to log the “EMPTY” log message so that it is visible when a problem is occurring. Making the debug log level higher will hinder that ability.

@vccardenas and @lisa-altair I have modified the document to mention restrict_res_to_release_on_suspend in to be set to ncpus in a default setup.
Please have a look at the document again.

DEBUG2 seems right for logging a false ‘EMPTY’ response. I have reviewed the additional info on the setting of RRTROS. Overall looks good to me.

Seems like a pretty clear design @arungrover.
Here are a few comments:

  • the 4th bullet in the overview needs an asterisk * after the X, so it says “Cray X* series”
  • the fifth bullet in the overview, I think what is currently written is a bit too restrictive. How about this instead? “On a Cray X* series, the suspended low priority job and the high priority job must fit into the Cray compute node’s memory”
  • for interface #3, I think the interface change is really that you are setting restrict_res_to_release_on_suspend to ncpus…so the details can move to the interface, and the interface text can move into the details section. :slight_smile:
    I think you should state when it is being set (e.g. during install of PBS, start up of PBS, etc.?).
  • Maybe also state this is due to the restriction for Cray compute nodes (see above).
  • I also think you should state that admins may choose to free other resources (including custom ones) and then point them to the other page that talks about restrict_res_to_release_on_suspend.

Thanks for review comments.

I’ve addressed your comments and updated the document. please have a look again.

Thanks for making the changes.
One more small comment, in the 3rd bullet of interface 3 can you make it clear that this statement only applies to Cray X* series systems. Maybe by adding something like the following: “…is not advisable for Cray X* series compute nodes”.
Everything else looks good.

@lisa-altair I’ve modified it now.
@vccardenas you might want to have a look again.

Thanks!

Thanks! Looks good to me!

Looks good to me too.

@lisa-altair and @vccardenas thanks for your signoff. I have modified the document again :slight_smile:
This time I added information stating that a job with exclusive placement on a node can not be suspended.

Please have a look again and let me know what you think.

@arungrover, on Cray the compute vnodes have sharing = force_exclhost so any job that uses that compute vnode will have exclusive placement, and therefore that job can’t be suspended?

I think a job will not be suspended only if it requests an exclusive placement on a compute node using “-lplace” resource. I don’t think having sharing set to force_exclhost would have any affect.

@vccardenas and @lisa-altair
I have also added a line to mention a limitation that Cray imposes on number of co-resident jobs that can be there on a compute node. Please have a look again on the new doc.

Since there was some confusion about “exclusive access”, and because PBS has a few different things that use the word exclusive, perhaps you should reference “(i.e. -lplace=excl)” somewhere in interface #4?
Everything else looks good to me.

@lisa-altair thanks for the review. I’ve added the example in interface 4

Looks good to me @arungrover!

@arungrover, thanks for the explanation - the updates and overall look good to me.

Thanks for the chat, the EDD looks OK to me!