Escape control characters in job environment variables and account name

Hi All,

I posted a design for escaping control characters in a job’s environment variable and account name.

Link:

https://pbspro.atlassian.net/wiki/spaces/PD/pages/945127456/Escape+control+characters+in+job+environment+variables+and+account+name

Please let me know of any comments or suggestions.

Thanks,

The json output already escapes carriage returns and line feeds with \r and \n, respectively. Will it change to ^M and ^J?

[vstumpf@shecil pbspro]$ qsub -v def -- /bin/sleep 60
16.shecil
[vstumpf@shecil pbspro]$ qstat -f -Fjson
{
    "timestamp":1545430380,
    "pbs_version":"19.0.0",
    "pbs_server":"shecil",
    "Jobs":{
        "16.shecil":{
            "Variable_List":{
                "def":"ab\rcd",
            }
        }
    }
}

The conversion is done by qsub and the escaped string is stored with the job.

What if some of the control characters are being used by the job in its environment? Encoding them may affect the job execution.
If the problem is how we display them using qstat, how about handling it in client side(qstat)?

@vstumpf: Good point on the Json side. Perhaps, we shouldn’t touch carriage returns and line feeds and leave them as is. The main problem being addressed by the design is really the terminal changes that are occurring when doing qstat due to special control characters in attribute values.

@nithinj: I can see your point. Perhaps, we shouldn’t be saving the encoded values as that could affect job execution. Let me look into just making the changes in qstat only, since that’s the primary case we’re addressing. Plus, when I tried submitting the job:

export LESS_TERMCAP_md=(tput bold; tput setaf 1) qsub -V -A h(tput bold; tput setaf 1)d – /bin/sleep 100000

and had a queuejob hook that does:
import pbs
pbs.logmsg(pbs.LOG_DEBUG, “Account_Name=%s” % (pbs.event().job.Account_Name))
pbs.logmsg(pbs.LOG_DEBUG, “Variable_List=%s” % (pbs.event().job.Variable_List))

On the server_logs, it already displayed the escaped characters as is:

1/04/201914:24:40;0006;Server@corretja;Hook;Server@corretja;Account_Name=h^[[1m^[[31md
01/04/2019 14:24:40;0006;Server@corretja;Hook;Server@corretja;Variable_List=…LESS_TERMCAP_md=^[[1m^[[31m,…

It’s only in qstat output that terminal changed occurred to have green text and bold.

1 Like

Json also escapes backspace, form feed and a horizontal tab. (https://github.com/PBSPro/pbspro/blob/master/src/lib/Libcmds/pbs_json.c). It may also be needed to be treated in the same way. Also kindly update the page to reflect this.

dsv output on Windows uses caret as the default escape character. As all of the encoded control characters starts with a caret, it may be ambiguous to users.

This Linux/Unix only solution is to filter the stdout of the following commands, printing out the actual escape sequence characters like the ones done by ‘cat -t -v -e’ and not cause terminal translations: qstat, pbsnodes, qmgr, pbs_rstat, printjob,
I’ll update the EDD soon.

So the way I’m seeing the design now as when printing to stdout, translate (not store/encode) to ^ notation for characters < 32 ascii and not one of \n, \t, \f, \b, \r in Linux/Unix qstat, pbsnodes, qmgr, pbs_rstat, and printjob.

1 Like

I posted the updated design Version v.6

I see a problem. As described, the output values are not invertable. That is, from the qstat output, I cannot be sure what the actual value is. E.g., If I see “a\Ab”, was the actual value “a\01b” (using C notation), or was it “a\\Ab”? Furthermore, allowing the printable control characters through unmodified will mess up the output (e.g. inserting new lines).

Given that we have lost invertability, how about simply replacing all control characters in fields being output with “_”? This also has the advantage of keeping the string length the same.

If people want the original values (e.g., for processing by other programs), they can use the json format output.

Personally, I favor also rejecting or squashing all non-printable characters on input (except in environment variables). They are almost always a mistake. And, when they aren’t a mistake, they are probably evil.

The fix/design was concentrated on those special non-printing characters that
can alter the terminal display like changing colors or highlighting. Thus the
focus was to show the non-printable characters (ASCI 1-32) in an alternate
form except the special 5 control characters mentioned in the design

In terms of being invertable, I didn’t feel it’s that interesting
have to revert back to showing the output with the
translated characters invisible again. The choice of a caret notation,
(i.e. ^) was done to align with what ‘cat -v’ does, instead
of inventing something new.

The control characters (some printable) were left as is because of some constraints.
The (\n) and (\t) were left to show as is, as translating them
would affect many existing tests already in support of PBS. Besides,
the -Fjson option in qstat/pbs_rstat/pbsnodes is already doing the translation,
showing them as \n and \t respectively.

The (\f), (\r), and (\b) continue
to be invisible, since -Fjson qstat/pbs_rstat/pbsnodes option also displays
them translated to \f, \r, and \b respectively.

I feel it’s more obvious if something is shown as:
Account_Name = h^[[1m^[[31md
instead of
Account_Name = h__d

as that even becomes harder to distinguish if it’s the special _ or the real _

We’re also preventing the json format (-Fjson) to display the non-printing
special characters as that also affect terminal output. If you need the raw output, one can create a C code
that calls pbs_statjob(), pbs_statnode()/pbs_statvnode(), pbs_statresv() and the output from there is preserved.

I don’t understand part of the proposal. Why aren’t you translating newline, carriage return, tab, etc when they are part of a field value? These corrupt the output as much as the color changing sequences do. Do you have a use case where displaying a newline in a field value makes sense?

Also, should RUB ‘\177’ also be translated as is done by ls and cat -v? (displayed as \?)

Now I’m beginning to rethink the strategy. cat -v will actually show translated form feed, carriage return, and backspace in the ^<translated character) notation. I can make it the same and still allow -Fjson option to do its own translation to \f,\r, and \b respectively. But cat -v still prints out the newline and tab as is, so I’m keeping it that way with qstat/pbs_rstat/pbsnodes/qmgr/printjob/tracejob. And similarly, -Fjson would show newline and tab as \n and \t.
Sure, having a newline and tab in output is also not good, but it’s not the problem that I’ve been given to address. We could file a different ticket to handle newlines and tabs in output differently, and making sure dependent PBS tests are not affected. I know newlines and tab could appear inside exported shell functions and passed to PBS in qsub -V.

No, as the design only focuses on the first 32 non-printing characters and not the extended ASCII characters.

Updated design doc (v.7) to allow feed (FF), carriage return (CR), and backspace (BS) to also be shown translated.