The things I most often call out when reviewing pull requests
Josh, Tech Lead, dives into what he looks for when reviewing pull requests at Cleo. If you're looking to write code that scales better, this is a must read.
This is some text inside of a div block with Cleo CTA
CTASigning up takes 2 minutes. Scan this QR code to send the app to your phone.
Josh, Tech Lead at Cleo, outlines our approach to migrating our primary key columns from integer to bigint.
Did you know that in Postgres, a column of type integer has a maximum value of 2,147,483,647? If, like us, your company has a strong appetite for data, you might have discovered that it’s pretty easy to have database tables that exceed two billion rows. What happens if you used the integer type for your tables’ primary keys?? 🙀
If you’re unlucky enough to have found out, you’d know that the sequence used to generate primary key values will error when you ask for the next value once it has reached its maximum value 😞 You’re then in a pretty sticky situation. Ideally, you’d know when you were a few months away from running out of primary key values and have a plan for migrating to a different primary key.
The purpose of this blog post is to discuss two approaches we’ve used successfully at Cleo to migrate our primary key columns from integer to bigint. Hopefully, if you find yourself in this situation you can use this as a starting point for planning your own migration.
Depending on your needs, you might be able to get away with running an ALTER TABLE and ALTER SEQUENCE command to change the column type to bigint. The challenge with this is twofold:
How do you perform this migration with no downtime then?? We’ve tried the following approaches successfully:
In this approach, we create a new version of the table with a bigint primary key. Then, we either:
OR
Pros 👍
Cons 👎
This is a self-hosted tool that’s built in python and SQL, which will, in the background, create a full copy of your table with a bigint primary key and seamlessly swap the old for the new table, whilst replacing any foreign key references, views and materialized views to point to the new table.
Pros 👍
Cons 👎
Think about the following:
Regardless of which approach you choose, you’ll want to test out the approach on a test database that has the same replication setup as your main database. The best way to do this is to spin up a test database using one of the backups from your main database.
If you’re making changes to your application as part of your approach, also point a test version of your application at this database so you can verify the code changes you are planning to make are compatible with the database changes at each step of the process.
We try to time these types of changes:
As you’re executing the steps of your [hopefully] well-tested plan, should things go wrong, don’t be afraid to back out and try again another day. Plan continuation bias is a common pitfall we humans succumb to, and it could lead to you making a hasty decision that you haven’t tested that could lead to a prolonged outage or data loss.
Keep the team in the loop as to how you’re progressing. It’s helpful to do this as a pair so that if things do go wrong, someone can manage the comms while the other person is debugging the issue.
It can be daunting to plan an online migration of some of your biggest database tables with a looming deadline. Hopefully you’ve now got a couple of approaches to follow to help get you out of the woods. This is by no means an exhaustive list, so do your research and find the approach that best suits the need of your business.
Josh, Tech Lead, dives into what he looks for when reviewing pull requests at Cleo. If you're looking to write code that scales better, this is a must read.
Fabio, a Data Engineer, talks about why we introduced Espresso, our MLOps framework.
Getting Cleo’s testing game tight.