Did you know that in Postgres, a column of type integer has a maximum value of 2,147,483,647? If, like us, your company has a strong appetite for data, you might have discovered that it’s pretty easy to have database tables that exceed two billion rows. What happens if you used the integer type for your tables’ primary keys?? 🙀
If you’re unlucky enough to have found out, you’d know that the sequence used to generate primary key values will error when you ask for the next value once it has reached its maximum value 😞 You’re then in a pretty sticky situation. Ideally, you’d know when you were a few months away from running out of primary key values and have a plan for migrating to a different primary key.
The purpose of this blog post is to discuss two approaches we’ve used successfully at Cleo to migrate our primary key columns from integer to bigint. Hopefully, if you find yourself in this situation you can use this as a starting point for planning your own migration.
Step 1: Planning your approach
Depending on your needs, you might be able to get away with running an ALTER TABLE and ALTER SEQUENCE command to change the column type to bigint. The challenge with this is twofold:
- You’ll also need to alter the column type of any foreign keys that reference your table.
- Running an ALTER TABLE locks the table until the operation is complete. This likely means prolonged downtime for your application as scanning two billion rows could take several hours.
How do you perform this migration with no downtime then?? We’ve tried the following approaches successfully:
Archive the current table and replace it with a new one
In this approach, we create a new version of the table with a bigint primary key. Then, we either:
- [If your application doesn’t immediately need to read/write from the archive] Swap out the old table with the new version of the table by renaming the tables in a database transaction.
- Create a view that combines the data from the old and new versions of the table with a UNION ALL and update your application code to ensure any write operations are performed directly on the underlying tables, but read operations are directed at the new view.
- Easy to back out should things go wrong
- Your new table is small
- Only works if there are no foreign keys pointing to your table
- Potential additional complexity in your application code…
- if you need to manage historical data in your archive table
- if your application needs to read from your archive table
Use the “repcloud” tool to manage the migration
This is a self-hosted tool that’s built in python and SQL, which will, in the background, create a full copy of your table with a bigint primary key and seamlessly swap the old for the new table, whilst replacing any foreign key references, views and materialized views to point to the new table.
- No changes to your application code are required
- Works with foreign keys pointing to your table
- Some types of replication mechanisms that work off of the write ahead log might encounter issues (we were able to debug and fix them, but something to test for ahead of time)
- You need to have enough extra storage space in your database to hold another copy of your table while the repcloud copy is running
- If something goes wrong, unless you have a good understanding of the underlying mechanisms repcloud uses, you might take a while to debug the problem
Step 2: Preparing the changes
Think about the following:
- Is your table replicated to an analytics database (e.g. Redshift)? If yes, you might need to perform an integer to bigint migration in your analytics database as well.
- The change could impact your database replication. For example, it could result in more database logs being generated during the migration. Does your replication have capacity to handle this influx?
- What’s your back-out plan should things go wrong? Have a step-by-step plan for your changes. For each step, also have a back-out step should something prevent you from proceeding. For example, if you cannot get a lock for a table when performing one of the operations, would it leave your system in an inconsistent state?
- Will there be any impact during or after the process on the wider engineers or data teams? Present your plan to them ahead of time to get their feedback and buy-in. Include plans for communication in your checklist of changes so that everyone is kept in the loop.
Regardless of which approach you choose, you’ll want to test out the approach on a test database that has the same replication setup as your main database. The best way to do this is to spin up a test database using one of the backups from your main database.
If you’re making changes to your application as part of your approach, also point a test version of your application at this database so you can verify the code changes you are planning to make are compatible with the database changes at each step of the process.
Step 3: Execute
We try to time these types of changes:
- During office hours so we have support from the wider team should things go wrong
- During a fairly quiet part of the day, when there is less load on the database
As you’re executing the steps of your [hopefully] well-tested plan, should things go wrong, don’t be afraid to back out and try again another day. Plan continuation bias is a common pitfall we humans succumb to, and it could lead to you making a hasty decision that you haven’t tested that could lead to a prolonged outage or data loss.
Keep the team in the loop as to how you’re progressing. It’s helpful to do this as a pair so that if things do go wrong, someone can manage the comms while the other person is debugging the issue.
It can be daunting to plan an online migration of some of your biggest database tables with a looming deadline. Hopefully you’ve now got a couple of approaches to follow to help get you out of the woods. This is by no means an exhaustive list, so do your research and find the approach that best suits the need of your business.