Cleo is on a mission to improve the world’s financial health by offering personalized insights and tools to help users reach their goals – all of which rely on a solid understanding of income.
To accomplish this, we built a model that predicts how likely an individual transaction is to be earned income. The good news is that we had a lot of transaction data to train a model with, but we faced a challenge – our transaction data lacked any labels.
We’re lucky to have the option of getting labels by annotating the data using an internal tool, but that takes forever and therefore, costs us more cash. So before committing to scaling up annotations, we decided to build a proof of concept to test our hypothesis that this was even possible and hence worth investing time in.
Step 1: The easy cases
In some instances, identifying income is straightforward. Transactions descriptions including 'payroll' are earned income (duh).
We began by crafting simple rules to label these obvious cases, providing a foundation upon which to build. However, the real world is rarely so straightforward.
Think about that $42 Walmart transaction – is it a paycheck or just a refund? These grey areas need more advanced techniques.
Step 2: Tapping into third-party categories
To capture some of this real world messiness, we used third-party categories as an extra feature to decide the labels. Now, we could train a model on the large dataset we generated from the unlabelled data.
The text-based representation of the input to our model is a perfect use case for a BERT-based model (which stands for Bidirectional Encoder Representations from Transformers) with a classification head.
While it was a step forward, we knew that our ‘easy cases’ heuristics could introduce bias and that the third-party labels definitely had their flaws.
We wanted to do better because we’re big-brained like that 💅
Step 3: Ground truth
This is where human annotations stepped in as a valuable source of high-quality training data. However, to save time and cost, we only collected a small sample.
So, how do we maximize their impact?
Our answer to this was to use the model we built earlier (in step 2) and evaluate it against the annotations (from step 3).
We could then inspect false positives and false negatives, highlighting common misclassifications to refine the heuristic rules to improve the label quality. This iterative process, known as bootstrapping, is the secret sauce.
We retained a holdout test set of the annotations (the real ground truth) to ensure our final model evaluation remained unbiased.
What did we learn?
- Annotate smart not hard: When working with limited resources, it's essential to be strategic with your annotations, focusing on areas where human expertise adds the most value. At Cleo, we’re actually annotating smart and hard but that’s a blog post for another day.
- It-er-ate: Embrace an agile development mindset and keep refining your model and data quality.
- Clarity is key: The process of asking human annotators to label data forced us to internally clarify and lock down definitions, which are critical when teaching human annotators and a machine learning model.
So, that’s how we transformed a large, unlabelled dataset into a labelled one, using a relatively small number of annotations.
The benefits were twofold: we created a valuable resource for training our machine learning model, and we gained a deeper understanding of our income classification task by working with the annotators to clarify labelling rules around edge cases.