So what would you say you do here? Release management?

I recently was asked to give a presentation on 'release' - audience: entire company.  Putting that together gave me much time to ruminate on 'what value are you delivering?'.

Note:  I'm writing this as a mental exercise; working on clarifying my purpose.  Much of this is naive (and rambling) philosophy.

While part of the operations department, I have been unofficially tasked with 'release management' for our dev/qa teams.  What 'release management' constitutes, I would posit, depends on your perspective of 'people, process, tools'.  Personally I fall under the 'people, process, tools - in that order' camp, so a good chunk of my focus has been on helping teams 'level up'.  Delivering value to customers (vs. products/features) and product ownership are also high on my priority list.

Fundamentals

  • Help teams grow by addressing/highlighting risk gaps (e.g. missing non-functional requirements)
  • Help teams understand by facilitating health checks & process mapping (empathy, flow, feedback loops)
  • Help customers through release impact analysis
  • Enable teams through automation, build/deploy/test processes, and adding/creating tooling.
As usual, the specifics of what your list is made up of, or how you do this is contextual - for example, the level of senior management buy-in, team morale/ability, or business focus will all dictate different approaches.  (read up on Cynefin, interesting stuff!)  However, I suspect that my context is not far off from what most experience.

Addressing risk gaps

Value: Preventing loss of customer satisfaction from easily addressed issues. (both internal and external customers!)
Cost: Depends on items being addressed - should be part of the risk estimation (high effort only worth high risk), use common sense

This piece is the 'hard and fast, must comply' portion.  Strategy is to target low-hanging/high-risk fruit first - things like basic monitoring, some form of Agile process, good development practices, appropriate test automation, release impact, etc.  You are either red (non-compliant, no plan in place), yellow (non-compliant, but plan in progress), or green (good enough for now).

I have put together a 'team scorecard' as a visual tool, but the reality is that nobody can get everything done - choosing battles is important here, as is making a good case for your choice.  Usually just pointing out that the basics aren't there is enough to generate discussion and action - especially if senior management is fully on-board.

Health checks

Value: Improves team communication, makes subjective issues visible
Cost: 3-4 meetings per year @ 2hrs each

Google 'spotify health check' and you'll quickly figure out what this is.  Facilitate a safe discussion where the 'touchy feely' side of the team can be evaluated.  The only goal of this is to get the team talking to each other about important (and sometimes controversial) topics.

This generates a simple colour map of how the organizations teams are doing, and you can easily spot trends or specific teams that need some lovin'.  The data generated is "I feel that..." information, and thus 'safe' from being used to evaluate compensation and such.  If nothing else, everyone walks away from these meetings having learned something, and spent time communicating w. team members.

Process mapping

Value: Share/document tribal knowledge, makes poor process visible
Cost: 2-3 meetings per year @ 1hr each

For the team leads, it can be a good exercise to review the process by which a ticket turns into consumed code.  For me, it's invaluable as a learning aid!  You also get to learn which areas they already know are pain points, and which areas they had not yet considered looking at.

A more macro perspective here would be to look at the people/high level processes in place and apply some systems thinking.  Haven't done this yet, though.

Release impact analysis

Value: Makes cost of 'non-ownership' visible, attaches value to customer experience
Cost: Depends on tooling in place, but can be very easy to assess

This came about because of many conversations around 'I don't understand why we cannot deploy at will' or 'The site is only down for a few minutes, that's not a big deal'.  Thankfully we have New Relic, because that data makes a pretty clear case.  Some simple application of 'our users see X, Y, and Z during our release window' and 'are we okay with that?'.  Once you present the data to folk outside of engineering, the foot comes down pretty hard.

The conversations you have because of the impact analysis will quickly lead to deploy exclusions.  e.g. please don't deploy between this and that time, unless it's an emergency
When this window becomes untenable for any of the teams (desire to increase/decrease), action will have to be taken to reduce the release impact.

Enabling teams

Value: Possible difference between long-term success and failure?
Cost: ?

This piece requires a bit of a paradigm shift, both in how you view the operations team, and in how you perceive product ownership.  I won't go into the depths of this, lots of material out there - suffice to say this is 'whole business transformation'.

Product ownership is wholly on the product/dev/qa team, who are supported by operations, and enabled by the business team.  It is not a free ride - ownership implies freedom, but also responsibility.  The catch is that this requires everyone to be on board.

The operations team, I suspect, should be an enabler and adviser - a partner to the dev/qa teams that brings non-functional requirements expertise alongside tooling and automation.

And thus...

I feel a bit better having written this down, but some key next steps will be laying down (on paper) my own personal vision for where I want to be going with all this.
     i.e. I believe that the reasons behind my actions are right.

Got this advice the other day:  "Keep pushing.  This is something we believe in that has all these implications - so we need to focus on making each of the implications/steps demonstrate its tie-in to the big picture."


Comments

Popular posts from this blog

DFSR - eventid 4312 - replication just won't work

Fixing duplicate SPNs (service principal name)

Logstash to Nagios - alerting based on Windows Event ID