How to add a foreign key to a table with zero downtime

Wait! What? 
Does adding a foreign key to database tables cause downtime?

The short answer is yes; due to the need to lock on the entire table for some time which may lead to prevent writing to the table, causing all incoming requests to timeout, and hence downtime for the entire service that uses the database.

Many forms of the ALTER TABLE command requires an ACCESS EXCLUSIVE LOCK; which will block read & write traffic to the table until the command finishes.

Now let’s imagine we need to add a foreign key constraint to our table and let’s say you already have the column that will be the foreign key (i.e. either it’s already existing before or you’ve just created it using the ADD COLUMN command)1.

Since adding a foreign key to a table is actually adding a new constraint to it, so a full scan to the table has to be done to make sure every single row is satisfying this new constraint.

Normally, this form will cause a scan of the table to verify that all existing rows in the table satisfy the new constraint.

PostgreSQL Official Documentation

Luckily, unlike most forms of ADD table_constraint, adding a foreign key requires only a SHARE ROW EXCLUSIVE lock. This means that read access will still available, but insert/update/delete will be blocked until the command finishes. 

Note that ADD FOREIGN KEY also acquires a SHARE ROW EXCLUSIVE lock on the referenced table, in addition to the lock on the table on which the constraint is declared.

PostgreSQL Official Documentation

Given the fact that a table full scan has to be done and will take a long time if it has millions of rows, and at the same time, nothing but SELECT is allowed; most probably your service will start to complain right after starting executing that expensive command. 

So now the question is what should you do if you need to add a foreign key without blocking writes into your table if it has millions of rows?

The answer is to use the NOT VALID keyword. This option will tell your database: “Okay, please add the foreign key constraint for me, but don’t care about the currently existing data that might contradict the new constraint, and only enforce the constraint against the subsequent inserts of updates”.

If the NOT VALID option is used, this potentially-lengthy scan is skipped. The constraint will still be enforced against subsequent inserts or updates (that is, they’ll fail unless there is a matching row in the referenced table, in the case of foreign keys, or they’ll fail unless the new row matches the specified check condition).

PostgreSQL Official Documentation

One note to mention here is that if the column you added is new (added right before creating the foreign key), then nothing to worry about. But if that column was already there and has some data; you need to make sure (on the application level) that this table might have invalid references to the intended table. So you have to handle that yourself.

After that, a VALIDATE CONSTRAINT command can be issued to verify that existing rows satisfy the constraint. The validation step does not need to lock out concurrent updates, since it knows that other transactions will be enforcing the constraint for rows that they insert or update; only pre-existing rows need to be checked. Hence, validation acquires only a SHARE UPDATE EXCLUSIVE lock on the table being altered. (If the constraint is a foreign key then a ROW SHARE lock is also required on the table referenced by the constraint.).

PostgreSQL Official Documentation

  1. Adding a new column to a table with millions of records can be tricky. It’s out of this article scope, but for now and to be on the safe side; we can say that make sure to add the new columns and make them nullable.