Avoiding Unnecessarily Running Triggers in Postgres
In Postgres (or any relational db), triggers can be useful:
1create table books (2 id serial primary key not null,3 name text not null,4 updated_at timestamptz default now()5);6
7create or replace function update_updated_at_column () returns trigger as $$8begin9 new.updated_at = now();10 return new;11end;12$$ language plpgsql;13
14create trigger update_books_updated_at before15update on books for each row16execute function update_updated_at_column ();17
18insert into books (name) values ('The Great Gatsby');19
20select name,updated_at from books;
name | updated_at |
---|---|
The Great Gatsby | 2024-08-24 17:58:09.87828-04 |
1update books2 set name = 'THE GREAT GATSBY'3where id = 14returning name,updated_at;
name | updated_at |
---|---|
THE GREAT GATSBY | 2024-08-24 18:01:21.313217-04 |
But, they can also cause performance issues. To show this, we’re going to create a pyramid scheme.
Case Study: A Pyramid Scheme
- Each person must sell $100 of goods
- Each person will attempt to bring in other people.
We’ll define revenue
as how much one individual has sold, and referred_revenue
as how much anyone they’ve referred has sold. We’ll take it a step further though, and we’ll give you credit for anyone who’s been referred by someone you referred, and so on.
As the creator of the scheme, my revenue + referred_revenue
is the total revenue of the company. Everybody is referred by me, or by somebody referred by me.
Let’s express our pyramid scheme in a schema:
1create table members (2 id serial primary key,3 name text not null,4 revenue numeric(10, 2) not null default 0,5 referred_revenue numeric(10, 2) not null default 0,6
7 referred_by integer references members (id)8);9
10create or replace function update_referred_revenue () returns trigger as $$11begin12 -- if at top of pyramid, stop13 if new.referred_by is null then14 return new;15 end if;16
17 -- apply any change to the next member up the pyramid18 update members19 set referred_revenue =20 referred_revenue21 + new.referred_revenue - coalesce(old.referred_revenue, 0)22 + new.revenue - coalesce(old.revenue, 0)23 where id = new.referred_by;24
25 return new;26end;27$$ language plpgsql;28
29create trigger update_referred_revenue after30insert or update on members for each row31execute function update_referred_revenue ();
Now, let’s see it in action:
1insert into members (name, revenue) values ('me', 100);2insert into members (name, revenue, referred_by) values ('Stella', 100, 1);3insert into members (name, revenue, referred_by) values ('Cornelius', 100, 1);4insert into members (name, revenue, referred_by) values ('Rita', 100, 2);5insert into members (name, revenue, referred_by) values ('Conrad', 100, 2);6insert into members (name, revenue, referred_by) values ('Helmut', 100, 3);7insert into members (name, revenue, referred_by) values ('Manfried', 100, 3);
We can see referred_revenue
cascading up the pyramid:
Name | Revenue | Referred Revenue |
---|---|---|
me | 100.00 | 600.00 |
Stella | 100.00 | 200.00 |
Cornelius | 100.00 | 200.00 |
Rita | 100.00 | 0.00 |
Conrad | 100.00 | 0.00 |
Helmut | 100.00 | 0.00 |
Manfried | 100.00 | 0.00 |
This is a nice approach because it’s handling recursion for us.
The Problem
Now, let’s say Cornelius gets a million new recruits, but they haven’t sold anything yet.
1insert into members (name, revenue, referred_by)2select 'member' || i::text, 0, 33from generate_series(1, 1000000) as i;
That’s taking minutes to run… we can’t have that. Our pyramid scheme requires maximum efficiency!
Fortunately, we know they haven’t sold anything, so we don’t need to update anyone’s referred_revenue
. We have 2 options.
Solution 1: Disable the Trigger
1alter table members disable trigger update_referred_revenue;2
3insert into members (name, revenue, referred_by)4select 'member' || i::text, 0, 35from generate_series(1, 1000000) as i;6
7alter table members enable trigger update_referred_revenue;
That runs in seconds. We could’ve also disabled all triggers on the table:
1alter table members disable trigger all;
Solution 2: Conditionally Run the Trigger
To do this we’ll need to create two triggers, one for insert
and one for update
- the logic in each when
condition will be different.
1drop trigger update_referred_revenue on members;2
3create trigger update_referred_revenue_on_update4after update on members5for each row6when (7 old.revenue != new.revenue8 or old.referred_revenue != new.referred_revenue9) execute function update_referred_revenue ();10
11create trigger update_referred_revenue_on_insert12after insert on members13for each row14when (15 new.revenue != 016 or new.referred_revenue != 017) execute function update_referred_revenue ();
And now our new members can join the scheme.
1insert into members (name, revenue, referred_by)2select 'member' || i::text, 0, 33from generate_series(1, 1000000) as i;
Conclusion
Being so specific about when a trigger runs can seem unnecessary, but as the pyramid scheme example shows, triggers can basically block bulk operations.
In situations where a trigger needs to run a query, I’ve found them to be a main culprit of performance issues.
Wow! You read the whole thing. People who make it this far sometimes
want to receive emails when I post something new.
I also have an RSS feed.