I recently, inadvertently, changed something about the output of the printjob command: printjob prints the job’s internal state and substate, I changed the code to store the job’s state as a letter instead of a number, so now when printjob prints the internal state, it prints a letter instead of a number.
I didn’t create a design doc for this change because the output of printjob command is undocumented. So, just like we don’t document or announce changes to other undocumented interfaces like daemon logs (excluding accounting logs), I feel that it doesn’t make sense to document a change to printjob’s output either. We also clearly state in our guides that users should use qstat for generic purposes, not printjob.
I know that there are opinions against my viewpoint on this, so I wanted to start this thread to discuss this further and get other opinions.
Thanks for starting this thread @agrawalravi90. It is true that printjob is not documented and is more like a troubleshooting tool, yet it is widely used and is almost the only way to look at job files on the mom. I think such old interfaces that are very widely used should be presumed as “supported” and we should publish the intended changes as part of a design document for the community to review. If you agree, can we put up a short design doc and link to this forum discussion? We would know from the feedback whether a lot of sites are going to be affected…or not.
I also disagree that we can change interfaces just because they’re undocumented. Many of the older interfaces in PBSPro have not been fully specified, but that does not mean we should change them without good reason, and even if we change them we should warn customers.
printjob is indeed the only interface that lets customers run scripts to figure out what MoM thinks of jobs without bothering the server – it is for that specific purpose that the author of the tool wrote the code to parse JB files (in addition to the flavour that supports specifying a job ID, which does go to the server). Not bothering the server is very important on large clusters, because otherwise scripts can (and have) spam the server so hard it quasihangs.
If we had an alternative (and better documented!) method of finding job state/substate and attributes on MoM then of course printjob would be less important.
I don’t mind that at all, but how will it work? we can’t just publish what i changed without publishing what was there before, so won’t this end up requiring that we document and support printjob’s output completely?
Also, how should developers determine whether an undocumented interface needs documentation? Even daemon logs are ancient interfaces, sites use them for troubleshooting as well, but we don’t document any changes to them right?
Isn’t that the only difference between a documented and an undocumented interface? If we start following the ‘deprecate, document and can only remove after 2 releases’ cycle for undocumented interfaces as well then basically ALL interfaces are supported/documented?
We have more leeway in changing undocumented interfaces (which does not mean we should do it without a strong rationale), but yes, if we change them then we DO have to document the change if they are external interfaces (and like it our not, since with reason the structure of the JB file is not an external interface, “printjob” is the only interface to access it). In an ideal world al external interfaces should be fully specified, but we are not living in an ideal world.
To develop a rationale and make a decision about whether to change the interface, though, you do need to document the change and to start a discussion, which is not what you were arguing here – you were arguing that no such discussion is needed.
I don’t think a design document is such a problem. You do not even have to specify the complete interface, just the old behaviour and how it would change (and why). You did just that when fixing the bug, really, only you only specified the old buggy behaviour there rather than the old correct one.
that is indeed what I’m trying to clarify. We ship a bunch of tools and APIs with PBS as “unsupported”. This includes the entire PTL library and associated tools. We cannot possibly discuss every interface change made to it, or discuss every change made to scheduling or server logs, it will slow development down horribly. I know that I’m being stubborn about the printjob change, it’s very easy for me to do what you are suggesting, I just don’t want to set a dangerous precedent.
Everything command that is NOT shipped in the “unsupported” directory but in either bin or sbin is supported. printjob is in /opt/pbs/bin, so that is most definitely supported. It has a man page, and it is fully described in the documentation (including the reference documentation), although its interface is not fully described in the documentation.
There is even an example in the Admin Guide that relies on printjob if you want to find the session ID of the tasks of a job to kill them (presumably on MoM, since it’d be hard to kill telepathically from another host). It is also mentioned in the Admin Guide as the way to “look at job information on the execution host” (13.12.4 “Job Information on Execution Host”).
So while the exact output of printjob is not a documented interface, to say that printjob is unsupported is simply completely untenable.
The only way to make it fully unsupported would be to deprecate it, and it hasn’t been deprecated (and for good reason, since I mentioned as nauseam, and so did Subhasis, that we have no alternative if we want to know what MoM thinks of the job through the JB file).
The PTL is not part of the shipped product at all. So you are completely right about that, and is also completely irrelevant in this discussion.
But you shouldn’t be asking me about all of this, of course. I’m no authority on this, even though I know some of these things by virtue of having been a customer for decades.
Ok, I see your point Alexis. TBH, it feels like we should just document printjob’s output since it sounds like it’s a command that sites rely on.
Anyways, since there’s very little interest from the community on this, the easiest thing for me to do is not change the existing interface altogether, so I’ll just do that and revert the letters back to numbers. Thanks for being patient with me.