When to write automation
I've long fallen prey to 'oh, I'll only just do this once - no point automating it', and it's really been just the last few years that things have changed.
Two examples, taught to me by Mr. Mike Weeks, one of the best automation engineers I've ever had the pleasure of working with.
Who is on-call?
We were a small team, and every week the task came up to swap the on-call person in our Slack 'help-me' group (the goalie/support request point person was the only member in that group). It only took a few moments, but required you to check the Pagerduty schedule and then click around to get to the group and swap members.
I was initially not advocating we deal with that toil as there were legitimate slow-burn fires on the go, but eventually we decided to spend the time. It took a few days of effort, but once in place the automation revealed something I hadn't seen at the time - the toil had a cost. Sure, it wasn't a lot of work, but it was adding to mental friction, to context switching - the toil made the work harder, more frustrating, and more stressful. A year down the road, I would never want to go back, and other teams adopted the tool and are also saving toil.
What is the job?
A lot of our work revolves around Terraform modules for AWS infrastructure, and Github PRs are the only way changes happen. We also have a lot of services consuming our modules, and so a version change in an S3 module means potentially several dozen PRs that we have to create. When one is dealing with this kind of work regularly, creating duplicate PRs is a sadness. Toil.
Mike wrote a CLI tool to handle this and I got my eyes opened using it the other day. You create your PR body, define the module being updated (there's a Github API call to look for the ref), the module version you want to use, and any new module inputs to add. Then it's a simple uv run . -r <repo> command and it handles all the terraform plan validation, branch and PR creation - the lot. Saved me a full day of doing the PRs manually.
If you look at how our local system is designed - Terraform modules for service team consumption - it's obvious that some form of dependency management is gonna be part of our work, and very regularly! So the time spent automating this is well worth it, and pays for itself very quickly.
Conclusion
This is the mental model/flow that helps me now discern when to automate. Some questions to consider:
- Do I understand the work? Is it something that is a normal part of our system? Is it special-cause-ey or common-cause-ey? Are you tracking all of your work items? Are you investing time in understanding what the work is?
- Is the work toilsome? Does it take a long time? Is it very prone to manual error? Does it happen with great frequency?
- Am I reacting with 'do not automate' because I lack skills or muscle memory in automating this particular type of thing? (are you fearing learning?)
- Am I reacting 'do not automate' because we have "high priority" things that need to happen right now? (spoiler: there are always "high priority" things)
- How much toil cost does this task have? Does it grieve you to do this work?
Chances are, if you are embedded in the work, you will struggle to see the ROI on automation in the face. Seek to understand the system of work you live in. Have a bias to automate when informed by understanding.
Lesson: Be more like Mike - learn to see and automate toil.
Comments
Post a Comment