pbs_snapshot --obfuscate is supposed to anonymize sensitive information, but today it doesn’t anonymize everything, so I was working on fixing that. But I’m not sure how to anonymize certain files which are captured by pbs_snapshot today:
mom_priv/jobs/.JB files: These are binary files, but running grep on sensitive job attributes returns saying that this file contains those attributes. Should we bother? Should we leave them as it is, or delete them altogether when --obfuscate is provided?
system files:
output of ps aux
output of ps -leaf
etc.
These can also contain hostnames, jobnames, usernames etc. do we go through the effort of anonymizing them? Is it okay if we just don’t capture these when --obfuscate is provided?
@scc it would be great if you could provide some insights
Hi @agrawalravi90, good questions. Here are my thoughts:
For JB files, maybe if we are using --obfuscate we can run printjob on the files and obfuscate the output and not actually collect the .JB files themselves? This would still be helpful for troubleshooting purposes, I feel. Typically these would be looked at to answer a question like “what substate was the job in on the host vs. what the server thought?”, or “was the session ID actually being traceked by the mom?”, and I think it is worth doing.
For system outputs like ps it is probably safest to simply not collect them when obfuscating, since we don’t know everything that COULD be sensitive (process names, args to the processes, etc.). But I think we only collect ps -leaf output, correct? So I think we’d only really have to obfuscate the UID and CMD fields, and for CMD what if we just obfuscate the whole thing as one replacement token, so “XXYY” could actually map to the entirety of something like “/usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only”, for example? Still a lot of effort?
the --obfuscate code was completely redesigned, we still capture all system filed with --obfuscate, we simply use regex(es) to remove sensitive information. Please refer to https://pbspro.atlassian.net/wiki/spaces/PD/pages/1233125400/Redesigning+pbs+snapshot+--obfuscate to see details of the algorithm. Please let me know if you think the new code suffices or if you’d still like to restrict capturing only the files that you listed above.
That’s right, I had forgotten, thanks. My response today was initiated when I found my partially composed response from April, and now you have reminded me why I did not actually post it previously. Sorry about the noise.