Elasticsearch 'count' query becomes a Nagios-alertable value

One of our lines of inquiry for using the ELK stack was monitoring/alerting.  The best way is to use Logstash to directly send to Nagios, however this proved to be more difficult than anticipated, so the next best route was utilizing the simplicity of the Elasticsearch query via curl to get an alertable value.

I found one other person doing this, but didn't have the heart to ask for their script!  Decided to spend some time figuring it out for myself and share the results.  If we truly can't get Logstash output working, then we'll come back to this with a full Nagios service script.  This gives you everything but the Nagios service...
curl -XGET 'http://kibana.domain.com:9200/_count' -d '{"query": {"filtered":{"query":{"match":{"EventID": "4624"}},"filter":{"range": {"@timestamp": {"gt": "now-1m"}} }} }}'
 2>/dev/null | egrep -o '"count":[0-9]+' | cut -d":" -f2
70

This takes the results of a 'count' search for Windows EventID 4624 (Audit_Success) in the last minute.  The theory is you search for specific errors by EventID - should always be 0 - then alarm if you get something untoward.

The actual output from the curl is this:
{"count":70,"_shards":{"total":45,"successful":45,"failed":0}}

Hope this helps someone!  Took me a while to get the query syntax right.  Co-worker supplied the egrep regex.

Comments

Popular posts from this blog

DFSR - eventid 4312 - replication just won't work

Fixing duplicate SPNs (service principal name)

Logstash to Nagios - alerting based on Windows Event ID