Hi,
I have created this EDD for implementing PP-479
Please provide your comments / feedback on it here
Thank you
Shrini
Hi,
I have created this EDD for implementing PP-479
Please provide your comments / feedback on it here
Thank you
Shrini
Hi Shrini,
Thanks for writing this document.
I have some first level commentsâŚ
Thanks,
Arun
I guess this is more internal than external, but why are you getting rid of the tracking table and turning all of the queued subjobs into real jobs? When a subjob is run, it is turned into a real job. Why donât you just save the real running subjobs when the server is shut down, so they can be recovered when the server comes back up?
If you go the route of turning all queued subjobs into real jobs, make sure to not change the pbs_statjob() interface for subjobs. This will break the scheduler.
Bhroam
Thank you Arun for you comments. my replies are as below
No, qalter ing a subjob is not enabled by this change.
Now the status of finished subjob is obtained from status_job() instead of status_subjob() and a status of finished subjob in job obj or db is âXâ not âFâ. The affecting code snippet is pasted below
---------------------------- src/server/stat_job.c ----------------------------
index 261740f..107d18a 100644
@@ -377,12 +377,11 @@ status_subjob(job *pjob, struct batch_request *preq, svrattrl *pal, int subj, pb
if ((pjob->ji_qs.ji_svrflags & JOB_SVFLG_ArrayJob) == 0)
return PBSE_IVALREQ;
- /* if subjob is running, use real job structure */
+ /* if subjob job obj exists, use real job structure */
- if (get_subjob_state(pjob, subj) == JOB_STATE_RUNNING) {
- psubjob = find_job(mk_subjob_id(pjob, subj));
- if (psubjob)
- status_job(psubjob, preq, pal, pstathd, bad);
+ psubjob = find_job(mk_subjob_id(pjob, subj));
+ if (psubjob) {
+ status_job(psubjob, preq, pal, pstathd, bad);
return 0;
}
Well practically almost everything in a subjob is now âunique copyâ since the subjobs âpjobâ is pushed to job db table including its own set of job attributes which are pushed to job_attr db table. However all the subjobs of an array will have some of information entries same.
Thank you Bhroam for your comments
I agree this is more internal, but I was insisted for putting up an âEDDâ
Im not getting rid of the tracking table from the memory cache, however Iâm getting rid of the subjob tracking table from the database table.
@subhasisb please back me up here for removing the tracking table from database?
Nope, queued subjobs are not turned to real jobs in this design
Exactly what is being done
As I said, queued subjobs are not turned to real jobs. There is no change in pbs_statjob() and no change in scheduler as well
Iâm sorry for appending 6th interface to EDD now.
I have a working experimental code in my dev branch.
If you are interested you can have a look.
please note that its NOT ready for review.
Its at the top commit named âsave progressâ in my dev branch: https://github.com/Shrini-h/pbspro/tree/PP-479
Hi â the change to allow subjobs to survive server restarts is great â thx!
It sounds like there are additional interface changes also being proposed, e.g., introducing a different finished state for subjobs (X versus F), changing how subjob status is accessed, possibly others. I feel these other changes should be avoided unless there is a compelling use case for them. And, even then, perhaps those interface changes can be split off into a separate ticket/check-in?
Is there a way to only fix the this one issue (allow subjobs to survive server restarts) without making all these other interface changes too?
Thank you @billnitzberg for you feedback.
As I mentioned in the interface, I was trying to disclose one of the side effect from my experimental design implementation as an interface just to âtest the waterâ.
However, I worked on your suggestion and I was able to address it by finding a way to keep the current behavior on finished subjob state. So I have deleted this interface 5 in latest EDD update.
Hope that was your only concern and it has been addressed.
I have a couple of minor suggestions
Bhroam
@bhroam, my feel is that the title of the enhancement is probably slightly limiting. I think the original idea was to make subjobs be treated like regular (non-array) jobs. In that vein, the point was not only to be able to have subjobs survive server restarts, but also start to show behavior similar to regular jobs, and thus we felt like removing the additional restrictions placed on sub-jobs makes sense.
Maybe we should broaden the topic of the enhancement to really say what we wanted âmake subjobs first class citizens - aka have all the rights of a regular jobâ. Of course, if we want to limit changes, we can always implement part of the enhancement, but even discussing the entire enhancement could be beneficial. What do you feel?
About the tracking table: This was only required to be stored since we did not store the subjobs in the database. Now since we have the subjobs stored like regular jobs, we can always re-create the data we need - this, i feel, is better than having two different representation of the same data, which can potentially cause inconsistencies if we are not careful. At startup we would recreate the needed count information from the regular job table (instead of the tracking table), and thus we will not have to do 2 saves as well, just the regular job save would simply function well enough. Thoughts?
It might be worth coordinating PP-773 and PP-479, as both affect subjobs of array jobs.
This makes sense. Maybe changing the title of the RFE would be good.
So let me see if I get this straight. While the server is running, a job array will consist of a tracking table which contains all the queued subjobs and one job object per running subjob. When we save the job data into the database, we take the tracking table, âcreateâ a job object for each subjob, and save that into the database? A 10000 way job array will create 10000 job entries in the database?
While I donât know the performance impact on this, Iâm not sure I see the point. If the queued subjobs are going to be in a tracking table in memory, why donât we save that tracking table to the database? The tracking table only contains a small amount of data for each queued subjob, so why are we copying the vast majority of the data from the parent just to save it? Weâll have a lot of duplicate data.
If the point of this RFE is to make a subjob a first class citizen, Iâd think weâd want to make the queued subjobs real jobs in memory. That way you can make any modifications to a particular subjob (e.g., change it mail points) if you wanted to.
Iâm not the database expert, so maybe it isnât a big deal to have all that duplicated job data in the database. If it isnât, my worries are unfounded.
Bhroam
No, that is not the plan. Subjobs will be stored (saved) into the DB just like regular jobs, at the point of time of their creation in the server, and then on for usual state changes etc. So, the number of saves is exactly the same as before, i.e, the number of jobs that can pass through PBS (run throughput). The other reason to save every subjob like a regular job is that to be able to continue after a server restart, we need to store the exact job attributes like exec_xxx attributesâŚ
However, the tracking table does not need to be saved. It can be just held in memory to server only one purpose, ie, to quickly respond to statâs that ask for array countsâŚwe would obviously not want to count the number of running, waiting, subjobs of a large array every time a stat is issued. So that in essence is just a cache, which will be created at server startup (from the DB will have subjobs in the regular job table) and the in-memory tracking table counts will be updated.
Does this sounds reasonable?
Yes, we should coordinate. Thanks for pointing out.
Done.
Since I now edited the EDD title to relflect the whole picture, I believe you wont have this concern anymore. Moreover, If new usable feature is got by removing three lines of code, then we should definitely go for it.
I checked the code and run_version is not part of the tracking table (both db and cache).
I believe once a a subjob hits R state it will have its job obj and job db. It will continue to have these even if it goes back to Q. i.e subjob wont get reabsorbed into the job array.
Now the title is changed as below
âExternal interface design for making subjobs on par with regular jobs and subjobs surviving server restarts.â
I completely agree that running subjobs need to be saved as a normal job. I still am confused about queued subjobs. In memory they exist in a tracking table. This means that only some attributes of a queued subjob can be modified. If this is the case, why are you writing more data than that out to disk? I understand that if you write the whole job structure out to disk you can recreate the tracking table from it. Youâll just go through an exercise of writing out extraneous data, and then when you read it back in, youâll throw that data away. To me, that sounds wasteful. Either create all the subjobs as real jobs in memory, or just write the tracking table out to disk. Doing it half way seems wrong.
As a note, with the new title, it makes it sound that I can modify an attribute on a queued subjob. Can I? Can I do a qalter -a 2200 123[34]?
I also hesitate to do changes like this now because of what weâre talking about for PP-506. Weâre changing the way job arrays exist in PBS for that RFE. Should we be making changes like this now?
Maybe we should revert the title back to what it was and truly just do that. Just make running subjobs survive a server restart. Itâs a small bite sized piece of work.
Bhroam
I love the idea of making âsubjobs on par with regular jobsâ, but am concerned that it is a big effort and introduces a lot of risk for this late in the release cycle. Why not restrict this fix to the original enhancement (only addressing subjobs surviving server restarts), then make additional (more risky) changes in behavior for the next release?
Some things to consider if subjobs == regular jobs:
Iâm sure there are a lot of other things to consider tooâŚ
Not half way Bhroam. I believe we do not need to save either the queued jobs, or the tracking table in the database. We anyway save the array itself in the DB, and as jobs are instantiated (run and finished or requeued), they are saved like regular jobs. So what you have in the db is something like this:
123.svr[1-10] â Array parent itself
123.svr[5] â Running
123.svr[6] â> (instantiated in other ways, like ran and now requeued).
123.svr[9] â Finished
This above information is enough to recreate the tracking table in memory and populate the counts of running, queued, finished etc jobs.
Technically, yes, we should be able to qalter a queued subjob. Even an uninstantiated one, which can be âinstantiatedâ in the db when there is a need. If we do not want to do this, we can restrict this easily as well.
I do see the need to discuss the intended behavior, which is why we wanted to discuss with the community. We wanted to discuss the entire larger view and then only implement the part we are clear on, in terms of the behavior.