In our company we’ve built a log analysis platform based on Elasticsearch, the server_logs and accounting_logs are being injected into Elasticsearch, we have dashboards and tools that queries the data in the Elasticsearch DB.
There’s a limitation with the amount of data that you get in the logs, e.g. when a job is submitted and stayed queued you don’t get access to the Output_Path and Error_Path, to get that data you would need to call qstat every time a new job is submitted and I would like to avoid doing that because it would put stress on the scheduler.
I would like to access the psql once every time a new job is submitted and extract all of the fields from the psql DB.
What are your thoughts? is that a scalable solution? would that also put pressure on the scheduler?
After the data is extracted from psql and injected into Elasticsearch then our dashboards and tools would use the ElasticSearch DB and the psql DB would be left alone, it’s only once per job submission.
I would like to hear your thoughts on the subject, I don’t want to bog down the scheduler but I won’t to improve our data analytics platform.