2023-03-01

The things I most often call out when reviewing pull requests

Josh, Tech Lead, dives into what he looks for when reviewing pull requests at Cleo. If you're looking to write code that scales better, this is a must read.

IN THIS ARTICLE:

As a tech lead and member of the platform team at Cleo, I keep an sharp eye out for how people interact with the database when I review their code. Small, seemingly innocuous decisions when creating an ActiveRecord model sometimes mean the code will quickly start slowing down once the data grows beyond a few thousand records. The goal of this blog post is to call out some common patterns I see with a [hopefully] more optimal solution so that you, dear reader, can write code that scales better out of the gate.

Finding the timestamp at which the latest record was created

Record.order(created_at: :desc).first.created_at # Less optimal

We’ve all done it. Sort the records descending by created_at grab the first [newest] record, then get its created_at. Logically, it makes sense. However, this is less optimal because it fetches the entire row back from the database. If your table is indexed on created_at, the query will perform an index scan, but will still need to fetch the corresponding row. Try this instead:

Record.maximum(:created_at) # More optimal

If your table is indexed on created_at, the query will perform an index-only scan, meaning it never needs to fetch the actual row. This can save a lot of time if your code is running this query several times per second.

Indexing columns like “type” or “state” or “category”

t.string :category, null: false, index: true # Index unlikely to be used

“I’ve put an index on the column, so of course if we filter by the column we’ve indexed it will be fast!” Not so fast. If the column has “low cardinality” (database-speak for: only a handful of unique values appear in this column), a regular b-tree index won’t help you.

Record.where(category: 'red').where(start_date: Date.current) # Will scan the entire table

The query will perform a full-scan of the table to find the records that meet your filter criteria (your index won’t be used). In fact, having this unused index sitting around will actually hurt performance of your insert/update/delete queries as they need to keep the index up-to-date.

There’s a couple of tricks you can do here to help our your query performance when filtering by low-cardinality columns:

Do your main query patterns filter by “category” and an additional column? You can create a composite index (database-speak for: an index that covers multiple columns). Put the lowest-cardinality column last in the list.
add_index :records, [:start_date, :category] # Composite index
Do your main query patterns filter by the same category every time? You can add a partial index (database-speak for an index with a where filter applied to it) that is much smaller than the composite index mentioned above.

add_index :records, :start_date, where: "category IN ('red')" # Partial index

Defining both sides of a relationship

Like any good relationship, it takes two. Imagine this scenario:

class Record < ApplicationRecord belongs_to :parent end

When you try and delete the parent, you get this exception:

record = Record.first parent = record.parent parent.destroy # ActiveRecord::InvalidForeignKey: PG::ForeignKeyViolation: ERROR: update or delete on table "parents" violates foreign key constraint "fk_rails_xxx" on table "records" DETAIL: Key (id)=(xxx) is still referenced from table

Why? When you created your table, you probably added a foreign key reference between records and parents(database-speak for telling the database that a column in one table contains pointers to rows in a different table), e.g.

t.references :parent, foreign_key: true, null: false, index: true # Adding a foreign key reference

The database will prevent you from removing a row from a table if it detects that that row is being pointed at from a column in another table.

The fix is easier than you think! Rails manages these references for you. All you need to do is to remember to define the other side of the relationship on the parent.rb model, e.g.

class Parent < ApplicationRecord has_many :records, dependent: :destroy end

:destroy is just one option. Note that with :destroy, it will destroy any record objects associated with the parentobject when it is destroyed, so make sure you make a conscious decision about the dependent behaviour you want. You can read more about dependent behaviour here.

Use citext for case-insensitive columns

Imagine you want to store someone’s email address in a user model, and you want to be able to look up users by email. What if…

They enter their email in all uppercase, e.g. CLEO@TEST.COM, when they create their account
When they go to log in, they enter their email in initcaps, e.g. Cleo@test.com
Someone else attempts to create an account with the same email, but lowercase, e.g. cleo@test.com. We’d want to detect that an account is already associated with this email.

You’d need to do some pretty careful normalisation of email addresses to ensure that all of these variations are handled consistently 👆

t.citext :email, null: false, index: { unique: true }

This will ensure any lookups against email will be case insensitive, as will the unique index on email.

What if I told you that there was a way for the database to handle this normalisation for you?? Create your email column using the citext module.

User.where(email: 'CLEO@TEST.COM') == User.where(email: 'cleo@test.com') # Same user

Use database constraints AND ActiveRecord validations

Imagine we continue with the above example 👆We have an email column on our user model; its presence and uniqueness is required. We can enforce these constraints via two means:

Constraints in the database itself, e.g.
t.citext :email, null: false, index: { unique: true }
ActiveRecord validations, e.g.
validates :email, presence: true, uniqueness: true

Surely just one of these is enough, right? Wrong. They serve different purposes. Database constraints enforce data integrity. You can [pretty much] guarantee that a user will always have a unique email via database constraints. ActiveRecord validations provide helpful error messages and helper logic for dealing with records that might violate our constraints (e.g. things like .valid? tell you if the record might fail to save).

ActiveRecord validations work in the layer above the database, and are thus susceptible to race conditions (e.g. two users try to create an account with the same email at the same time and these requests are handled on two different processes - happens more often than you’d expect).

To get both strong data integrity protections as well as robust error handling in your Rails app, use ActiveRecord validations and make sure you back them with database constraints.

Don’t double up on your indices

Let’s revisit this example:

Record.where(category: 'red').where(start_date: Date.current)

Say you had the following indices on your table:

add_index :records, [:start_date, :category] # Composite indexadd_index :records, :start_date # Unnecessary index

Surely you need the single-column index on start_date to ensure this query will be fast, right?

Record.where(start_date: Date.current) # Can still use the composite index

Wrong. As long as the column by which you are filtering is the leading column (database-speak for first column in the list of columns in the composite index), you can use the composite index as a replacement for the single-column index. Note that this does not work for non-leading columns:

Record.where(category: 'red') # Won't use the composite index

Why not just keep both indices just to be safe? Indices are important and should be used generously, but the more you have, the more impact keeping them up-to-date has on the performance of your insert/update/delete operations. Keeping a lid on unnecessary indices is an important part of keeping your database healthy.

Summary

These are just a handful of performance optimisations I look out for when reviewing pull requests at Cleo. The nice thing about these optimisations is that they require little effort up-front, but help reduce the maintenance costs of your application over the long-term, which means you can run a large codebase with the relatively small support team.

FAQs

Still have questions? Find answers below.