Pbs_snapshot --obfuscate: how to deal with certain files?

Hi,

pbs_snapshot --obfuscate is supposed to anonymize sensitive information, but today it doesn’t anonymize everything, so I was working on fixing that. But I’m not sure how to anonymize certain files which are captured by pbs_snapshot today:

mom_priv/jobs/.JB files: These are binary files, but running grep on sensitive job attributes returns saying that this file contains those attributes. Should we bother? Should we leave them as it is, or delete them altogether when --obfuscate is provided?

system files:

  • output of ps aux
  • output of ps -leaf
    etc.
    These can also contain hostnames, jobnames, usernames etc. do we go through the effort of anonymizing them? Is it okay if we just don’t capture these when --obfuscate is provided?

@scc it would be great if you could provide some insights

Thanks,
Ravi

Hi @agrawalravi90, good questions. Here are my thoughts:

For JB files, maybe if we are using --obfuscate we can run printjob on the files and obfuscate the output and not actually collect the .JB files themselves? This would still be helpful for troubleshooting purposes, I feel. Typically these would be looked at to answer a question like “what substate was the job in on the host vs. what the server thought?”, or “was the session ID actually being traceked by the mom?”, and I think it is worth doing.

For system outputs like ps it is probably safest to simply not collect them when obfuscating, since we don’t know everything that COULD be sensitive (process names, args to the processes, etc.). But I think we only collect ps -leaf output, correct? So I think we’d only really have to obfuscate the UID and CMD fields, and for CMD what if we just obfuscate the whole thing as one replacement token, so “XXYY” could actually map to the entirety of something like “/usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only”, for example? Still a lot of effort?

Thanks for the inputs @scc. For system outputs, we capture the following :

ps aux
ps -leaf
cat /etc/hosts
cat /etc/nsswitch.conf
lsof
vmstat
df -h
dmesg

Some of them could be pretty big, so is it really worth the effort to obfuscate all of them?

Hi @agrawalravi90,very sorry about the delayed response.

Let’s just not collect anything in the system directory when --obfuscate is in use except for the following:

os_info
vmstat.out
process_info
pbs_hostn_v.out
pbs_probe_v.out

Hey @scc, we actually already fixed this as part of https://github.com/PBSPro/pbspro/pull/1096

the --obfuscate code was completely redesigned, we still capture all system filed with --obfuscate, we simply use regex(es) to remove sensitive information. Please refer to https://pbspro.atlassian.net/wiki/spaces/PD/pages/1233125400/Redesigning+pbs+snapshot+--obfuscate to see details of the algorithm. Please let me know if you think the new code suffices or if you’d still like to restrict capturing only the files that you listed above.

That’s right, I had forgotten, thanks. My response today was initiated when I found my partially composed response from April, and now you have reminded me why I did not actually post it previously. Sorry about the noise.

1 Like