Showing posts from September, 2017

Application-generated metrics - Part1: What is the difference between Metrics, Events, and Logs?

We're working on making 'application-generated metrics' an accessible thing through some sort of metrics framework.  The subtle goal is that we'd like to provide the ability for metrics-driven decisions to become a viable option, and to gently push back on "sales-driven development". Given that I spent an entire day trying to figure out the answer to this question of 'how are metrics/events/logs different', it's probably worth taking the time to write it down. DISCLAIMER: This is not a scientific paper, so do your own research to refute or support the following... Backstory What is 'application-generated metrics'?  An old concept that I'm probably wording poorly.  Essentially, as your code does stuff, it should tell you about it.  Keeping track of important flows like registration and payments should be boosted by having dashboards/monitoring that tracks 'registration failure' or 'payment failure'.  This was my or

Update on canary - how has our release impact changed?

We had two primary targets for our canary process - our monolith has a main web app and a main api - deployments to those two using normal deploy processes cause 'release impact'.  Namely, all our systems go wonky and the scope of 'what gets disrupted' is sometimes even kinda unknown (or rather, has never been tabulated). For the backstory, check out the other two posts on this topic: TL;DR Customer impact with canary is now 0 ( barring a broken deploy ) We can now release at will instead of inside low-throughput release windows We have fast feedback via metrics that will automatically revert a failed deployment The actual deploy process is longer, but that's ok - because we don't have a time restriction Some definitions... Impact window ( length of release impact ) - how long are user

Detailed review of our canary + IIS + AWS + Octopus process

Another 'clear my head' posts.  Given the view count, that's all these ever are anyways.  :) This is an expansion of the previous post , just a bit more detail, pictures, and a gist . I would preface this with a note that we are in the Probably More Common Than The Internet Would Like To Admit camp of helping Legacy systems grow and blossom into less-Legacy systems.  So if I am not using the latest hotness, you may excuse my backwardness.  Also excuse blogger's formatting, which is also backwardness. Basic Traffic Flow Nginx is there as a legacy thing, and we ran into some interesting issues trying to put an ALB upstream target in Nginx ( fancy DNS issues due to ELB/ALB autoscaling ). The instances are Windows boxes running IIS - always on port 80 - with the previous version left in place, even after a successful deploy, just in case ( the amusing thing is that I had a 'port jumping' solution all written up, only to discover that our app can't dea