Skip to main content

Posts

Showing posts from October, 2014

TFS & Jenkins & Chef, oh my: Part 1 - Intro to the POC

I'll use (this|these) blog entr(y|ies) as a brain dump for a new proof-of-concept project I'm working on.  Long story short, we investigated build/deploy automation and concluded config mgmt was where it was at (after extensive due diligence).

We have some hurdles to overcome:

Everything build/deploy 'process' today is manual, frankly just due to bad original implementation that was never correctedWe have a LOT of VS2010 setup projects - no longer a thing in VS2013 (2012 actually, and the MS plugin for '13 to support them apparently is buggy)We don't use software versioning (yes, it's boggling - we use the build version)We have both Windows services and web apps/svcsWe use BizTalk'proof of concepts' tend to be adopted as production tools, so I have to build it right the first time The initial gist (hm...should this be on github?) of this is straightforward.  Jenkins will build from TFS2010, but instead of bad MSIs, we get raw filedumps.  These will g…

Logstash to Nagios - IIS logging

More fun with Logstash, Nagios, and IIS logging.  We have all of our environment website logs dump into Logstash via NXlog, and stuff is tagged/outputted.

The below gives us a passive monitor that CRITs on 400/500 codes and OKs on 200/300 codes.  Not sure how this'll really work out, but the monitor should get flooded with enough 400/500 during a serious outage to cause a notification to go out.  We'll have to test that.  This is working like a charm at the moment, though!

20-filters_tagging.conf
<snipped>
if [SourceName] == "IIS" {
        if [s-ip] =~ /^192.168.(\d{1,3}).(\d{1,3})/ {
                grok {
                        match => ["sc-status", "[2,3,4,5]\d\d"]
                        add_tag => ["nagios_check_iislog","UAT","IIS"]
                        add_field => ["nagios_service", "UATPSV-IIS_Traffic"]
                        tag_on_failure => []
                }
  …

Logstash to Nagios - alerting based on Windows Event ID

This took way longer than it should have to get going...so here's a config and brain dump...

Why?
You want to have a central place to analyze Windows Event/IIS/local application logs, alert off specific events, alert off specific situations.  You don't have the budget for a boxed solution.  You want pretty graphs.  You don't particularly care about individual server states.  (see rationale below - although you certainly have all the tools here to care, I haven't provided that configuration)

How?
ELK stack, OMD, NXlog agent, and Rsyslog.  The premise here is as follows:

Event generated on server into EventLogNXlog ships to Logstash inputLogstash filter adds fields and tags to specified eventsLogstash output sends to a passive Nagios service via the Nagios NSCA outputThe passive service on Nagios (Check_MK c/o OMD) does its thing w. alerting
OMD
Open Monitoring Distribution, but the real point here is Check_MK (IIRC Icinga uses this...).  It makes Nagios easy to use and main…

Elasticsearch 'count' query becomes a Nagios-alertable value

One of our lines of inquiry for using the ELK stack was monitoring/alerting.  The best way is to use Logstash to directly send to Nagios, however this proved to be more difficult than anticipated, so the next best route was utilizing the simplicity of the Elasticsearch query via curl to get an alertable value.

I found one other person doing this, but didn't have the heart to ask for their script!  Decided to spend some time figuring it out for myself and share the results.  If we truly can't get Logstash output working, then we'll come back to this with a full Nagios service script.  This gives you everything but the Nagios service...
curl -XGET 'http://kibana.domain.com:9200/_count' -d '{"query": {"filtered":{"query":{"match":{"EventID": "4624"}},"filter":{"range": {"@timestamp": {"gt": "now-1m"}} }} }}' 2>/dev/null | egrep -o '"count&q…