App review: InfraDog (new take on SMB system monitoring)

A friend of mine has been very busy lately - the last few years, in fact - and I've found out just what he's accomplished.  I would note right off the bat that I am not being paid for this, just helping get the word out on an app that could be a game-changer for many of you SMB folk.

Since we live in a 'TL;DR' (too long; didn't read) kind of world I will cut to the chase, then paste in my 7-day InfraDog journal.  If you are in SMB, do not have server monitoring in place, and want cheap insurance on common IT blunders that cost you business credibility and money, just do the 30-day free trial.  Do it.  I don't have actual figures on post-trial pricing, but I understand it is extremely (read: can't afford not to) affordable to keep going after the 30 days are up.

Setup is almost comically easy if you're familiar with traditional monitoring solutions (SCOM, Nagios).  Step 1, download and install the management client onto a single server/PC of your choice.  Step 2, add desired servers/PCs to your list of monitored machines.  Step 3, download the app on your IOS device (iPhone preferred), connect, and start poking around.

Out of the box you get CPU/RAM/disk monitoring templates.  There are other event log templates available, but the key indicators in a 'there could be a problem here' kind of monitoring system are those three.  I would also hazard a guess that new templates will continue to become available as the app develops (it was only released a week ago as of writing).  These templates can then be applied to your group (or groups) of servers (only Windows as of release - VMware and Linux on the way).

This is just my gut feeling, but in the SMB arena (which they tell me they are exclusively targeting) this will completely change how companies a) view monitoring (the vast majority do nothing), and b) make IT professionals/consultants' lives easier and less stressful.  Once you get familiar with the app layout, it starts to dawn on you just how handy this app could be.

Now, I'll be 100% clear here...this will not replace Nagios or SCOM...it's not meant to.  This gives you a front-line defence against common IT problems, and the ability to know about them before they become a problem.  Further, the potential this app has really makes you think - if you have seen their YouTube commercial, that conveys the basic premise they're going for.  Consider this:  Reading event logs is now easy...and fast.

The app is not perfect as of release, but it's pretty darn polished and smooth for only having been on the market for a week.  I'm impressed.  Good job, guys!!

Edit:  Forgot to put a linky in: http://www.infradog.com

Journal as follows.....be warned it rambles.

Device: iPad1, wifi-only

Day1

  • Got registered, downloaded, installed and had ~50 servers (3 subnets) added within 15 minutes.
  • Applied the 3 default CPU/RAM/Disk monitors.
  • Poked around, really enjoying the ping/portquery option and how it looks - doesn’t just give ‘ok’ but lets you see results of the command.
  • I started using the app on my iPad (gen1), unfortunate that there is only the ‘2x’ button (designed as an iPhone app) - I think you could get a lot of value out of the extra screen real estate for reports, viewing all info at once.  A tag team of iPad in the office, iPhone out of the office might be interesting.


Day2

  • Got a LOT of email alerts (60) overnight, pretty much all of them memory-related (set to alarm by default at 85% usage).  Up/downs.  So it’s clear that this is not an ‘out of the box you’re good to go’ kind of thing, but in reality the CPU/RAM is usually not a problem, almost always it’s disk causing serious issues.


Day3

  • Got a disk notification for a server that’s been troublesome in the past - discovered the job meant to clean the backup files had failed to run.  Probably time to just fix the ‘sins of the fathers’ and move the backups off the DB drive.
  • Found some of the event log filtering a big buggy - removing informational under a particular server’s system event log removed all items, even though there were a few recent warnings.  Hit refresh, and it displayed all events.  Checked the filter settings - informational still set to ‘off’.  Odd.  Oh well, some bugs to be expected.  Update - not a bug...filter has a time period default of 24 hours.  Whups.
  • Not a huge amount of monitor templates, but that will probably change going forward.
  • My impression of the app thus far is that it’s a completely different way of viewing your server inventory...I see this going places.


Day4

  • Spent some time looking around at what’s available.  (going from memory here)  The summary report can be emailed straight from the app, but individual server items must be emailed from the on-device email profile...unfortunate as my test device did not have any email configured, but in the real world not a big deal.
  • The re-scan would not pick up the newly sized disk right away - after about 15 minutes post-scan, the disk started showing up.
  • Found the app would crash (forgotten the specific method), but not a big deal as it started up again very quickly.
  • Creating a user-defined server group wasn’t too difficult, very easy to do once the process figured out.
  • Scrolling through event logs is very nicely implemented - the key point is ‘fast’.  Security log is not available, but that’s not a deal-breaker, as you’d most likely be at a PC by this point anyways.  It’s too easy to start thinking this app can do it all.  Maybe in future...


Day5

  • Adding in some method of monitoring application response time would be handy, but hardly falls under ‘simple stuff’.
  • The ‘infradog’ email folder I created was getting quite full by this point (300 emails over the weekend), so I adjusted the monitor templates from the defaults.  This was another instance of ‘would be nice if...’ as the options for the template monitors are fairly limited.
  • The general report is handy if you need to view all system specs in one place without compiling a list yourself (and assuming you don’t have any other sort of reporting/monitoring infrastructure).


Day6

  • Checked out my alert emails, and I’m still getting them set on the old RAM template values...investigated - I did not ‘apply’ the template to the host group.  Did so, but since I changed the parameters of the disk check, I now have two disk checks - one with each setting (%/GB).  Poked around (literally, har) for a way to remove the old ‘%’ disk template...it seems that since I’m now using the ‘new’ template, there is no way to remove the old one without removing both.  The confusing bit is where I only edited the one disk template.  After removing the GB disk monitor, I recreated the % disk monitor, re-applied it, removed, re-created the GB disk monitor and applied it.  Seems ok now, and it was only the disk monitor with this bug.
  • The GB disk monitor is confusing to read: Free space < 10GB, Value=24GB, Total=29GB  (should just read ‘Only 5GB free!’)
  • The pull down to refresh is cool.
  • This seems like a great place for some graphing, but would get into huge databases for Infradog to support.


Day7

  • Alert frequency is now very manageable, the template tweaks did the job.


Comments

Popular posts from this blog

DFSR - eventid 4312 - replication just won't work

Fixing duplicate SPNs (service principal name)

Logstash to Nagios - alerting based on Windows Event ID