2023-05-16

Let's have an Espresso: MLOps at Cleo

Fabio, a Data Engineer, talks about why we introduced Espresso, our MLOps framework.

IN THIS ARTICLE:

Machine learning is a critical component of Cleo’s technology landscape: we are currently maintaining machine learning models to help with the most disparate parts of our pipelines and services, ranging from transaction enrichment to scoring.

In the fast-paced world of data science, anyway, it might exist a natural temptation to cut corners while developing machine learning solutions, particularly given the complexity involved in these models. However, at Cleo we think compromising on our engineering principles could not only harm your project's quality but also introduce unnecessary headaches down the line.

We think we can level up our machine learning solutions by adopting software development best practices, such as thorough testing, continuous integration/continuous delivery (CI/CD) and monitoring, but we are not going to do that at the cost of our values!

That's why we introduced Espresso ☕, our MLOps framework: we wanted to provide an easy and standardized solution to serve, deploy and monitor our machine learning models. With Espresso, we shifted in fact the paradigm from machine learning models to machine learning services.

Our coffee beans - Before Espresso

Every good espresso needs good coffee beans, and our Espresso makes no exception.

Having high-quality machine learning solutions is a game changer when it comes to building a MLOps framework. For us, this meant we were able to trust Data Science team implementation of the core logic of our services, enabling more flexibility during development for data scientists and saving a tonne of time to invest in perfecting Espresso.

However, before Espresso, we had basically no standard solutions to build and serve our great services, while monitoring was almost nonexistent!

Roasting the coffee beans

We believe every taste is unique, so just like how a coffee connoisseur roasts their own beans to achieve the perfect cup, we crafted our own MLOps framework to deliver the best possible outcomes for our needs.

While there are plenty of off-the-shelf options available, we wanted to ensure that our framework was tailored for us, rather than trying to fit us into a predefined box.

We realized also that just like coffee blends can provide richer and more complex flavor, mixing different ideas from different solutions could result in better outcomes for Espresso!

Anyway, we needed a production-grade solution, not just a fancy playground, so we made the conscious decision to build Espresso prioritizing from Day 0 compatibility and ease of migration.

That's how the first iterations of this framework (before it got this wonderful name!) were already solving most of the needs we explained above although they were still relying on Amazon Sagemaker as a hosting solution to be able to iterate faster, to learn at speed and to make it happen!

Eventually, we realized that Sagemaker was not the right serving tool for us (sorry, Sagemaker, it's not you, it's us! We are still using it for training our models!) for couple of reasons like:

We thought (spoiler alert: we were right!) we could do better with costs and deployment times.
We had pretty much unique needs incompatible with some Sagemaker features, as a result we quickly found ourselves using Sagemaker solely as a hosting solution.
Sagemaker requests throttling is very aggressive and it depends on the number of cores: we have both I/O and CPU bound services and we ended up very soon with a lot of replicas with very small CPU utilization, wasting resources and money!
HTTP headers are stripped out and there is no support for W3C tracing headers. This means we were not able to follow a trace entirely from our tools without implementing custom code!

That's why Espresso started to take shape as a full-fledged serving solution based on Kubernetes and Istio!

How we brewed our Espresso

Of course, we put a lot of love into crafting our Espresso ♥️, but we also relied on a variety of enabling technologies to handle the nitty-gritty work behind the scenes. We can't just list them all here as we want to give our framework the attention it deserves. So, we'll give you a sneak peek into some of the buzzwords we're currently working with, but stay tuned for deep-dive articles coming soon about:

Deployment: how we used CircleCI and Terraform to build an infrastructure as code pipeline to deploy our ML services in our Kubernetes cluster seamlessly.
Serving: how we used s2i, FastAPI, Istio and Docker to abstract complexities for Data Science team, offering a solution that just works while also boasting advanced capabilities such as server-side batching!
Monitoring: how we keep a close eye on our machine learning services through integrations with Rollbar, New Relic, Prometheus, Cloudwatch, Slack and our own exciting Espresso Data Capture for model monitoring!

Tasting the Espresso

Just like a well-crafted hot espresso shot, reviewing the results of your work can be invigorating, and evaluating the numbers is a fundamental step in every Cleo project, so let's take a sip and let's get this energy boost!

1.⌛ Time to create a new service dramatically dropped down from days to - potentially - minutes, thanks to Espresso CLI which enables you to generate a skeleton project in a few seconds!

2. Monitoring capabilities were very basic before Espresso. Now, being able to automatically generate dashboards and to analyze different spans of one request, we found out important bottlenecks in our recurring transaction matcher service in just 10 minutes. This analysis brought a 50x speedup 📈 to the service!

3. After migrating out of Sagemaker, full CI/CD run time went from ~25 minutes to ~8 minutes 📉!

4. The adoption of Espresso server-side batching feature guaranteed a 3x average speedup 🏎️ to anti-fraud service!

5. We got a cost saving of ~40% 💰while keeping the same performances!

6. Internal surveys revealed how developer experience improved in basically all phases (development, deployment, monitoring) with Espresso being cited consistently in the good quadrants of Data Science squad retrospective 💙

‍
Bottom line: when you've got all the right ingredients in place and everything is executed just right, you can craft something truly delicious (*chef's kiss*) and the end result is gonna be pretty darn satisfying.

‍Let's raise a cup of espresso to a successful MLOps framework - cheers to great taste and even better results!

‍

If you like the sound of working in our data team, check out our open roles here.

FAQs

Still have questions? Find answers below.