Tightening Constraints in Postgres
Imagine we have a user table, and users have emails.
1create extension if not exists "uuid-ossp";2
3create table "user" (4 id uuid primary key default uuid_generate_v4(),5 email text not null6 constraint is_valid_email_address7 check (8 email ~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'9 )10);
After a while, we realize our constraint was too permissive, and that invalid emails exist in our data.
1insert into "user" (id, email)2values ('4e8a9373-bdef-4266-92e2-54b6b22fa970', 'john..rockefeller@standard-oil.com');
We can fix this like this.
Step 1. Add a new not valid
constraint
When we mark a constraint as not valid
, it is checked against new inserts and updates, but not upon constraint creation.
1alter table "user"2 add constraint is_valid_email_address_v23 check (4 email ~* '^(?![.])(?!.*[.]{2})[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}(?<![.])$'5 )6 not valid;
1insert into "user" (email)2values ('henry..flagler@standard-oil.com');3-- ERROR: new row for relation "user" violates check constraint "is_valid_email_address_v2"
Step 2. Find invalid rows and fix them
1select id, email2from "user"3where not (4 email ~* '^(?![.])(?!.*[.]{2})[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}(?<![.])$'5);
1update "user"2set email = 'john.rockefeller@standard-oil.com'3where id = '4e8a9373-bdef-4266-92e2-54b6b22fa970';
Now, all data is valid.
Step 3. Validate the constraint
1alter table "user"2 validate constraint is_valid_email_address_v2;
Now we know for sure all rows in the table adhere to the constraint.
Step 4. Clean up the constraints
1alter table "user"2 drop constraint is_valid_email_address;3
4alter table "user"5 rename constraint is_valid_email_address_v26 to is_valid_email_address;
Problem: After step 1, invalid rows are uneditable
The previous solution is great, but it has a problem. While the not valid
constraint is in place, we can’t update other columns on existing, invalid rows, where they remain in an invalid state.
After step 1, we would get this error.
1update "user" set id = id;2ERROR: new row for relation "user" violates check constraint "is_valid_email_address_v2"
And we’re not trying to edit email!
An alternate approach, that doesn’t have this problem, is to add a trigger initially, rather than a not valid
constraint.
1-- ✅ blocks new rows2insert into "user" (email)3values ('henry..flagler@standard-oil.com');4-- ERROR: invalid email: henry..flagler@standard-oil.com5
6-- ✅ allows updating of existing rows7update "user" set id = id;8-- UPDATE 19
10-- ✅ allows us to fix the data11update "user" set email = 'john.rockefeller@standard-oil.com' where id = '4e8a9373-bdef-4266-92e2-54b6b22fa970';12UPDATE 113
14-- ✅ does not allow us to break it again15update "user" set email = 'john..rockefeller@standard-oil.com' where id = '4e8a9373-bdef-4266-92e2-54b6b22fa970';16-- ERROR: invalid email: john..rockefeller@standard-oil.com
We can use this to validate new data, as we root out the invalid data.
Once we’ve fixed the data though, we should move to a constraint over a trigger, as it’s declarative and simpler.
”not valid” constraints work for domains as well
The same situation we’ve explored here applies to domains as well.
We can add a not valid
constraint to a domain.
1alter domain email2 add constraint is_valid_email_address_v23 check (4 value ~* '^(?![.])(?!.*[.]{2})[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}(?<![.])$'5 )6 not valid;
And we can validate it.
1alter domain email2 validate constraint is_valid_email_address_v2;
If that succeeds, we can drop the original, weaker constraint.
1alter domain email drop constraint is_valid_email_address;2alter domain email rename constraint is_valid_email_address_v2 to is_valid_email_address;
Wow! You read the whole thing. People who make it this far sometimes
want to receive emails when I post something new.
I also have an RSS feed.