This is a list of the core PBS Pro information that pbs_diag collects in it’s default information gathering mode, but which are missing from this EDD which proposes to replace it. There IS some redundancy in what gets captured by pbs_diag, but it has proven to be useful to have the same information available in different formats when investigating problems:
pbs_probe -v
qmgr -c "print server"
qmgr -c "print node @default"
qmgr -c "p h @default"
qmgr -c "list pbshook"
pbs_hostn -v $(hostname)
pbs_rstat
qstat -t
qstat -tf
qstat -x
qstat -xf
qstat -ns
pbsnodes -a
pbsnodes -avSj
pbsnodes -aSj
pbsnodes -avS
pbsnodes -aS
pbsnodes -aFdsv
pbsnodes -avFdsv
capture $PBS_HOME/pbs_environment
capture the local pbs_comm logs for the same days as server and sched logs are collected for
dump all of the mom’s vnode defs into a file
capture the entire $PBS_HOME/datastore/pg_log/ directory
captures these files from the local mom_priv directory: config prologue epilogue mom.lock
pbs_diag logs the stderr of all of the commands it runs in case the person examining the diag needs to know what (if anything) was written there.
Current pbs_diag does NOT capture the following core product information, but they have been on my list to add and I’d like to see them in pbs_snapshot (the qstat options are only just now being merged into the product) :
capture the local pbs_mom logs for the same days as server and sched (and comm) logs are collected for
qmgr -c "p r"
qstat -fx -F dsv
qstat -f -F dsv
These are Linux specific things that pbs_diag currently does in configuration gathering mode (Windows alternatives should be added f available, pbs_diag is Linux only):
cat /etc/release
ps -ef | grep pbs | grep -v grep
uname -a
tar.gz the output
These would be new but useful Linux specific operations that would be nice to have (Windows alternatives should be added if available, pbs_diag is Linux only):
lsof | grep pbs
ps -aux | grep pbs | grep -v grep
capture /etc/hosts
capture /etc/nsswitch.conf
I am ignoring the extra feature of pbs_diag where one can specify particular jobs to gather information on. All that really does is provide tracejob output for the job, qstat -f output for that job alone, pbs_dtj output for the job, and make sure that the log files covering the lifetime of the job got copied into the diag. I think this is safe to not implement in a replacement utility. tracejob adds nothing beyond the logs, qstat -f is provided elsewhere, pbs_dtj is not a supported tool and should not be used from pbs_snapshot, and the log copies can be covered with the -L option.
@sgombosi, your previous note says:
"server_priv output doesn’t appear to capture everything (no subdirectories, e.g. hooks, tmp, topology, etc.). "
but the EDD says:
server_priv sub-directory: a copy of the ‘server_priv’ directory inside PBS_HOME
So we can expect the pbs_snapshot that gets added to the product to do a complete capture of server_priv.