At Cleo, experimentation is the backbone of our product development cycle.
Every iteration of every feature goes through at least some kind of AB test ahead of launch to prove out its efficacy. Most of our squad key results are based on performance against a control group of users.
This doesn't render us unique as a business. It does, however, mean that how experimentation is operationalised across Cleo can have an outsize impact on our business performance.
If you’re interested, you can read more about the whats and whys of Cleo’s AB testing.
How to make sure that AB testing is being performed optimally?
This was a question I asked myself a few months ago. I could see that each of Cleo’s squads were each individually doing great things in the experimentation space. However, many of these good practices were not universal across the business.
Thus was born the "Experiment Optimisation Workstream". You can take the boy out of management consulting, but you can't take the management consulting out of the boy 🥲
In nicer terms: a multi-sprint project to get Cleo's testing game tight.
So, what did this workstream actually achieve?
Documentation Documentation Documentation
It sounds like a crushingly boring point, but improved test documentation has been such a quality-of-life win.
Each squad used to have mostly good, but inconsistent write-ups hidden in the depths of their respective Notion spaces. Now, we have a centralised Notion database of every experiment, covering the initial hypothesis right through to deployment in the product.
Each entry has 20 or so attributes, including (amongst other things) a record of key success metrics, dates for when the test is actually running, links to Figma designs and relevant analytics dashboards. We also include a record of the ultimate impact on business metrics once the test is concluded. This massively simplifies the discovery of work for both posterity, and for colleagues in other parts of the business.
The additional beauty of doing this with a Notion database is that different views and cuts of the underlying database can be created across multiple places in Notion. I really could riff on the beauty of Notion databases for more time than is considered acceptable in polite company…
To give a few examples:
- A squad can create a kanban board filtered for just their tests, to monitor a hypothesis' progress from initial ideation, through build, to deployment.
- A product lead can look at all the experiments in flight within a business area on a Gantt chart to see where there may be conflict between tests.
- If they're so inclined, the CEO can do a roll-up analysis to see the cumulative impact of each squad's experiments on the company's key results.
These can all be derived from a single place, meaning that information is consistent and fully synchronized. We love to see it.
Education Education Education
Given the importance of experimentation in our product development cycle, it makes no sense for the technical knowledge of how our testing analysis actually works to be confined to the product analysts alone.
The first step in breaking down this wall was to make sure that analysts were having regular sessions with the whole squad to talk about data and experimentation. Weekly "test and learn" sessions have now been rolled out across squads. These sessions provide a forum to discuss how tests are getting on, and to keep squads honest about next steps.
To help ensure that everyone can tell their Bayesian priors from their elbow, we also overhauled our training materials. We created highly detailed documents on every aspect of the experimentation cycle, written with a non-analyst audience in mind. Topics range from the basics of why it's even a good idea to run AB tests, to specific conditions for test promotion, via an introduction to Bayesian statistics.
Finally, we ran training sessions with squad engineers. We used worked examples to show how to use all the necessary tooling to conduct an experiment from start to finish - analysis and all.
Delegation Delegation Delegation
Having got squad members up to speed with the ins and outs of experimentation, we have been able to get non-analysts running the full test cycle autonomously. This has been especially effective for more straightforward product changes, for example, testing redesigned upsell screens, or simple logic changes for backend processes.
To avoid misalignment, we use RACI matrices (ex-consultants in the house, take a shot) to determine who should be Responsible for, Accountable for, Consulted on, and Informed about each test. Again, the Notion database makes this very simple to manage.
This delegation has come with some key benefits:
- It gets engineers even closer to the product that they're developing, helping them to see how their work is impacting users
- It gives non-analysts a greater confidence to input into strategic discussion . They are now familiar with metrics, and what can drive them.
- It frees capacity for squad analysts. They can then work on deeper exploratory analysis and product strategy work . In turn, helping to generate more high quality hypotheses to test.
Let’s Wrap This Up...
Cleo's "Experiment Optimisation Workstream" has really stepped up our approach to testing.
We've made documentation easier for everyone to access, trained our teams to understand the intricacies of testing, and spread out the workload more effectively. It's been about making things simpler, smarter, and more efficient for everyone involved.
As we move forward, we aim to keep improving and adjusting to whatever comes next!