What are Postmortems?
First, let's define what postmortems are. Postmortems are meetings held after an unexpected outcome in a project or task. The purpose of a postmortem is to reflect on what happened, understand what went wrong, and learn from the experience to prevent similar mistakes in the future.
At Cleo, we understand the importance of learning from our mistakes and continuously improving our processes. That's why we run postmortems after unexpected outcomes in our work.
What’s worthy of a Postmortem?
We encourage teams to hold postmortems whenever they feel it's necessary, but don't go overboard! While it's a good idea to have postmortems whenever something goes wrong, you need to consider some key factors first. Factors to evaluate include the severity, complexity, and frequency of the incident, as well as the potential for learning and improvement. If there's a chance to identify and address issues that can prevent similar incidents in the future, a postmortem is worth the time and effort.
There are also specific situations that are highly likely to warrant postmortems like major system failures, security breaches and bugs that cause harm to users or the business. At Cleo, we've had our fair share of incidents, so here there are good examples of postmortem-worthy events from this year:
- A bug that prevented our users from connecting bank accounts, causing a significant drop in Plus subscriptions.
- An app crash when users were choosing repayment dates for their cash advances which affected the conversion rates for this feature.
- 290k timed-out backend requests due to an environment variable being wiped.
However, there are situations where a postmortem may not be necessary. This is especially true for minor incidents that do not cause significant consequences or for regular maintenance tasks. Moreover, if the incident was caused by external factors beyond the team's control, such as a Heroku outage, a postmortem may not provide any meaningful insights or ways for improvement.
So, the team has gone through the all the previous steps to consider a postmortem and are still unsure about whether to have one? Here's some good news: just bringing up the possibility of a postmortem is reason enough to proceed with one. Most likely, the answer will be yes!
How we rock a Postmortem at Cleo
Alright, so we managed to bring production down with a change. It's all sorted now but... what’s next? If you are the engineer who responded to the issue, you probably wanna show off one of our key values, "Make it Happen" and take ownership by starting a productive discussion with your team about whether we need to organize a postmortem (let's be real, it was a big disruption in production so we probably should).
Now the process of running a postmortem at Cleo is relatively lightweight, but it's essential to ensure that the meeting is productive and useful to get the most out of it. The process can be split into three stages: Before, During, and After.
We value inclusivity in our postmortems, so we encourage all individuals who worked on or have an interest in the project to attend the meeting. We believe that by bringing in people from diverse backgrounds and perspectives, we can achieve amazing outcomes and make a greater impact, so publicize your postmortem meeting! Although you can make use of all the communication channels, our public squad channel on Slack is our most preferred way to do so.
To ensure that everyone attending the meeting has as much context as possible, we have a special template that we use to document the meeting. Some sections of the template must be completed beforehand and shared with the attendees. This will help ensure that attendees come to the meeting with as much context as possible. It is crucial to include as much detail as possible in this document, since it will serve as our source of truth for the upcoming meeting.
During: Analysis and Reflection
Using the postmortem template as a guide, we have two main goals:
- Document everything as thoroughly as possible.
- Reflect on what happened and learn from it. As the saying goes, "Those who don't learn from their mistakes are bound to repeat them." and who wants that?
As we proceed with this document, our focus is on expanding upon what has already been provided. This includes comprehending:
- The root cause. We suggest the 5 Whys analysis technique, but we acknowledge that this may not be suitable for every situation.
- What was the impact of the incident?
- Potential need for subsequent cleanup
- And of course, no postmortem is complete without questioning "What could we do differently in the future to prevent this from happening?". Our ultimate goal is to gain understanding from this incident and figure out how to avoid a similar occurrence.
Once we have a clear understanding of the situation, we evaluate any necessary follow-up actions and assign priorities based on their feasibility and to ensure that they are not forgotten, we assign owners to each action item and document them so that we can follow up on them. However, we acknowledge that we're not perfect, and sometimes certain action items may get overlooked, which is something we're actively working on improving at Cleo. We strive to continuously refine our processes, so if you'd like to lend us a hand with this, we'd love to have you on board.
Now, we want to be clear that even though we think it's important for everyone to ask questions and get clarification, we take blame-free culture really seriously here at Cleo, therefore, the bulk of the meeting should be focused on reflection. While this may be one of the facilitator's responsibilities, we consider it a shared responsibility for everyone attending.
After: Sharing and Follow-up
Share, share, and share!
We don't want to hide our mistakes, so we share our document with the outcome of the meeting with the relevant audience. The more, the merrier. Squad, pillar leads, the engineering leads, everyone in the meeting, the whole engineering team, you name it, they should see it.
It's highly recommendable to follow up on action items, if there are any, to ensure everyone is on track and address any issues or roadblocks that may arise.
Nice job finishing up that postmortem! This is a significant milestone in your professional growth as it demonstrates your commitment to continuous improvement and learning. As you likely already know, postmortems are a powerful tool for reflection, that offer everyone an opportunity to gain insights in what went wrong and stop it from happening again.
Next time something goes sideways, schedule a postmortem without hesitation. It could be just what you need to make things run smoother and avoid any more issues down the road.