I am adding two new accounting log records which will allow us to get correct resource usage knowledge of suspend/ resume event of the job.
Thanks for taking this up! A few comments:
- I think it’s important to print out resource_released_list information for z record if available
- I think it might also be useful to know who suspended/resumed the job, the scheduler or a user, so I think we should add a “requestor” field as well, similar to the U record.
- I’m not sure how useful it is to record the following: ‘accounting_id’, ‘user’, ‘group’, ‘job_name’, ‘queue’, ‘ctime’, ‘qtime’, ‘etime’, ‘start’. This information is either static, or doesn’t seem important to me in the context of suspending/resuming a job. What do you think?
- ‘exec_host’ & ‘exec_vnode’, this actually is quite relevant for suspend/resume, but this information can be HUGE. So, I just want others to chime in once on whether it’s important enough to record. I kind of feel like we don’t need both, maybe recording only one of them might be enough?
- ‘suspend time’/‘resume time’ - I don’t think we need to print this out explicitly, the timestamp of the record already gives us this information.
- ‘Resource_List resources’ - For the z record, I think it might be useful to print time based resources like resources_used.walltime, cput etc., but the rest of them (like ncpus, mem etc.) will be static information which the admins can get from the E record as well. We also print out all Resource_List in the Q record now, so I’m not sure how useful it is to print them here as well. Also, this information will probably not change from the z record to the r record, so I don’t think we need to print it out in the r record.
My first suggestion is to merge both these records into one. Make a job state change record. This will catch suspend and resume and all other state changes to the system which can be useful. It could be useful to know when a job went on hold and by whom. The record can be small, most of the things you are suggesting can be found elsewhere. We don’t need to print them again. Make it similar to the ‘a’ record which just prints what changed and that is it. At most include the requestor.
If you don’t want to do that, I think you are including way too much information in these records. Most if not all of it can be gained by looking at the Q and S records. I could see resources_used might be useful to see how far the job has progressed before it was suspended (no need for resume).
How will this interact with the admin-suspend feature?
Hi @agrawalravi90 ,
Yes I agree with you. I believe this information is available if we log the record in the post signal processing. I hope that is fine.
How about instead of resources_released_list, we add resources_released because this has node-wise resources released information and might be useful.
Made the changes.
Agree with you.
I have updated the EDD with both for now. I believe we may need exec_vnode more than exec_host. The node-wise resources allocated might be useful. Let me know your thoughts.
Agreed with Resource_List. For resources_used, For a running job, the only resources that can change is walltime and cput. It will be good to log only time based resources but I feel it will be better to log all resources_used. There might be a case where it might be useful to track the resources used by the job when it was suspended. What do you say?
Please review the updated EDD.
A state change record sounds useful. However, there are many different things we are accounting when the job’s state changes from one to other. I am keeping them seperate as of now.
Made the changes to the EDD.
For admin-suspend/resume, there shall be an accounting record similar to usual suspend/resume respectively. I am not sure if it will be useful if we account may be the type(suspend/admin-suspend). Let me know your thoughts.
Please review the updated EDD.
Please review the updated Design document for suspend and resume accounting log record and provide your feedback.
Thanks @sujatapatnaik52, looks good to me so far. I request a complete example log records be provided showing the suspend/resume of job that had requested at least one resource listed in restrict_res_to_release_on_suspend and at least one resource not listed in restrict_res_to_release_on_suspend.
The initial log record example had restrict_res_to_release_on_suspend set to ncpus, I have updated the EDD with same.
This one’s a bit confusing. If the resource is not listed in restrict_res_to_release_on_suspend, if I am not wrong, it might not be available in resources_released. I have added one example log record where restrict_res_to_release_on_suspend is unset. Let me know if I misunderstood something here.
Please use the example 1 setup where restrict_res_to_release_on_suspend is set to ncpus and submitted a new job that requests both ncpus (released) and mem (not released) resources in the proposed new log messages.
Thanks @scc for clarifying. Added a log record. Please review the changes.
Looks great, thanks.
Hi @sujatapatnaik52, few comments:
- “resources_released(if available)” : 2 comments here:
- I think we should print resource_released_list instead of resources_released, resources_released can be huge, just like exec_vnode. If we just want to tell users what resources were released then resource_released_list is better.
- I think we should print this information always, not just when RRTROS is set. When RRTROS is not set, I think resource_release_list can be the value of schedselect, so we can print that instead. This will be good for consistency, and I think it would always be useful to know what was released.
" * exec_host
Again, these can be huge. We don’t print this for any other record. So, do we really need to print either of them here?
Actually I might be wrong, I think it will be the same as ATTR_l instead of schedselect.
On second thoughts, since we don’t set resource_released_list or other RRTROS attributes on the job unless RRTROS is set, maybe it’s best not to print them out in accounting logs if it’s unset. Sorry for being confusing.
@agrawalravi90 resources_assigned is huge already right. This is printed in case of R or E record and parsing software like analytics can deal with it. It has additional information at a node level. Is this not something that Analytics might need? In other words, resource_release_list has the consolidated information (per resource) of what was released, but not granular to each node. That could be fine, if analytical software will not need it. What say?
Any of these two attributes shall give us information about the nodes that were assigned to the job when it got suspended/resumed which is usually static unless it gets changed in between. I would say we can keep any one of the attributes either exec_vnode or exec_host in the records. What do you say?
Hey Subhasis, resources_assigned is not as big as resources_released, resources_assigned is per resource, not per resource per node. But I just checked and it seems like we do print the exec_vnode and exec_host in E record of accounting logs, so I was wrong, there’s precedent for it, in which case, yes, it will be useful to print the resources released per vnode as it will help other tools calculate utilization more accurately.
So, we do print both of these in the E record, I was wrong. But since we do print it in the E record, it will be redundant to also print it in the resume record because it will be the same (jobs are resumed on the same vnodes). So, I don’t think we need to print it here. What do you think?
Yes I agree with you. I have updated the EDD. The resume record shall only have the requestor considering the most of other information is constant across other records. Please have a look at the same.
Thanks @sujatapatnaik52, I don’t think we need “exec_vnode” information in the z record either, rest looks good to me.
Made changes to the EDD. Thank you @agrawalravi90 for the review.