Teams are immutable
Fell Sunderland, staff engineer at Cleo, explores how seeing teams as immutable entities can offers new insights in how we lead, organise, and manage teams.
This is some text inside of a div block with Cleo CTA
CTASigning up takes 2 minutes. Scan this QR code to send the app to your phone.
Josh, Tech Lead, dives into what he looks for when reviewing pull requests at Cleo. If you're looking to write code that scales better, this is a must read.
As a tech lead and member of the platform team at Cleo, I keep an sharp eye out for how people interact with the database when I review their code. Small, seemingly innocuous decisions when creating an ActiveRecord model sometimes mean the code will quickly start slowing down once the data grows beyond a few thousand records. The goal of this blog post is to call out some common patterns I see with a [hopefully] more optimal solution so that you, dear reader, can write code that scales better out of the gate.
We’ve all done it. Sort the records descending by created_at grab the first [newest] record, then get its created_at. Logically, it makes sense. However, this is less optimal because it fetches the entire row back from the database. If your table is indexed on created_at, the query will perform an index scan, but will still need to fetch the corresponding row. Try this instead:
If your table is indexed on created_at, the query will perform an index-only scan, meaning it never needs to fetch the actual row. This can save a lot of time if your code is running this query several times per second.
“I’ve put an index on the column, so of course if we filter by the column we’ve indexed it will be fast!” Not so fast. If the column has “low cardinality” (database-speak for: only a handful of unique values appear in this column), a regular b-tree index won’t help you.
The query will perform a full-scan of the table to find the records that meet your filter criteria (your index won’t be used). In fact, having this unused index sitting around will actually hurt performance of your insert/update/delete queries as they need to keep the index up-to-date.
There’s a couple of tricks you can do here to help our your query performance when filtering by low-cardinality columns:
Like any good relationship, it takes two. Imagine this scenario:
When you try and delete the parent, you get this exception:
Why? When you created your table, you probably added a foreign key reference between records and parents(database-speak for telling the database that a column in one table contains pointers to rows in a different table), e.g.
The database will prevent you from removing a row from a table if it detects that that row is being pointed at from a column in another table.
The fix is easier than you think! Rails manages these references for you. All you need to do is to remember to define the other side of the relationship on the parent.rb model, e.g.
:destroy is just one option. Note that with :destroy, it will destroy any record objects associated with the parentobject when it is destroyed, so make sure you make a conscious decision about the dependent behaviour you want. You can read more about dependent behaviour here.
Imagine you want to store someone’s email address in a user model, and you want to be able to look up users by email. What if…
You’d need to do some pretty careful normalisation of email addresses to ensure that all of these variations are handled consistently 👆
This will ensure any lookups against email will be case insensitive, as will the unique index on email.
What if I told you that there was a way for the database to handle this normalisation for you?? Create your email column using the citext module.
Imagine we continue with the above example 👆We have an email column on our user model; its presence and uniqueness is required. We can enforce these constraints via two means:
Surely just one of these is enough, right? Wrong. They serve different purposes. Database constraints enforce data integrity. You can [pretty much] guarantee that a user will always have a unique email via database constraints. ActiveRecord validations provide helpful error messages and helper logic for dealing with records that might violate our constraints (e.g. things like .valid? tell you if the record might fail to save).
ActiveRecord validations work in the layer above the database, and are thus susceptible to race conditions (e.g. two users try to create an account with the same email at the same time and these requests are handled on two different processes - happens more often than you’d expect).
To get both strong data integrity protections as well as robust error handling in your Rails app, use ActiveRecord validations and make sure you back them with database constraints.
Let’s revisit this example:
Say you had the following indices on your table:
Surely you need the single-column index on start_date to ensure this query will be fast, right?
Wrong. As long as the column by which you are filtering is the leading column (database-speak for first column in the list of columns in the composite index), you can use the composite index as a replacement for the single-column index. Note that this does not work for non-leading columns:
Why not just keep both indices just to be safe? Indices are important and should be used generously, but the more you have, the more impact keeping them up-to-date has on the performance of your insert/update/delete operations. Keeping a lid on unnecessary indices is an important part of keeping your database healthy.
These are just a handful of performance optimisations I look out for when reviewing pull requests at Cleo. The nice thing about these optimisations is that they require little effort up-front, but help reduce the maintenance costs of your application over the long-term, which means you can run a large codebase with the relatively small support team.
Fell Sunderland, staff engineer at Cleo, explores how seeing teams as immutable entities can offers new insights in how we lead, organise, and manage teams.
Claire, a Backend Engineer, outlines what CI/CD looks like at Cleo.
Fabio, a Data Engineer, talks about why we introduced Espresso, our MLOps framework.