How to Generate Recurring Events in the Database
This is a followup to my request for help fixing the performance of a database view that generated recurrences for events. This was an essential feature of Sandy, and thus important to get right. The idea when I started was simple:
- Add a
recurrencedomain to the database that supports a number of different values, includingdaily
,weekly
, andmonthly
. - Add a
recurrencecolumn to theeventstable that identify how an event recurs. - Add a
recurrence_datestable that contains a pre-generated list of recurrences for a given date. I’d populated this table with five years of dates, each one mapped to five years worth of recurrence dates (see the originalblog entry for more on the format of this table. - Create a view that maps each
eventsrow to its date and recurrence in therecurrence_datestable. - Profit.
It was this last bullet point that didn’t quite work out: although the data was perfectly accurate, queries for a lot of events in the view were very expensive. I mean, the query could run for 3-4 minutes. It was just crazy! I couldn’t figure out the problem, so I posted my request for help. It was through discussions that followed with depesz that I finally figured out what the problem was: Although I was usually selecting only a week’s or months worth of events, the view was calculating rows for all five years worth of data for all of the events for a given user. Um, not efficient.
So here I finally document how, with a lot of help and example code from
depesz, I solved the problem. The trick was to use a function instead of a
view to generate the recurring event rows, and to limit it only to the dates
we’re interested in. For convenience sake, I broke this down into two PL/pgSQL
functions: one to generate recurring dates and one to return the recurring
event rows. But first, here’s the recurrence domain and the
events table, both of which are unchanged from the original
approach:
CREATE DOMAIN recurrence AS TEXT
CHECK ( VALUE IN ( 'none', 'daily', 'weekly', 'monthly' ) );
CREATE TABLE events (
id SERIAL PRIMARY KEY,
user_id INTEGER NOT NULL,
starts_at TIMESTAMP NOT NULL,
start_tz TEXT NOT NULL,
ends_at TIMESTAMP,
end_tz TEXT NOT NULL,
recurrence RECURRENCE NOT NULL DEFAULT 'none'
);
Just assume the user_id is a foreign key. Now let’s populate
this table with some data. For the purposes of this demonstration, I’m going
to create one event per day for 1000 days, evenly divided between daily,
weekly, monthly, and no recurrences, as well as five different times of day
and six different durations:
INSERT INTO events (user_id, starts_at, start_tz, ends_at, end_tz, recurrence)
SELECT 1,
ts::timestamp,
'PST8PDT',
ts::timestamp + dur::interval,
'PST8PDT',
recur
FROM (
SELECT '2007-12-19'::date + i || ' ' || CASE i % 5
WHEN 0 THEN '06:00'
WHEN 1 THEN '10:00'
WHEN 2 THEN '14:00'
WHEN 3 THEN '18:00'
ELSE '22:30'
END,
CASE i % 6
WHEN 0 THEN '2 hours'
WHEN 1 THEN '1 hour'
WHEN 2 THEN '45 minutes'
WHEN 3 THEN '3.5 hours'
WHEN 4 THEN '15 minutes'
ELSE '30 minutes'
END,
CASE i % 4
WHEN 0 THEN 'daily'
WHEN 1 THEN 'weekly'
WHEN 2 THEN 'monthly'
ELSE 'none'
END
FROM generate_series(1, 1000) as gen(i)
) AS ser( ts, dur, recur);
This gives us some nicely distributed data:
try=# select * from events limit 10;
id | user_id | starts_at | start_tz | ends_at | end_tz | recurrence
------+---------+---------------------+----------+---------------------+---------+------------
1 | 1 | 2007-12-20 10:00:00 | PST8PDT | 2007-12-20 11:00:00 | PST8PDT | weekly
2 | 1 | 2007-12-21 14:00:00 | PST8PDT | 2007-12-21 14:45:00 | PST8PDT | monthly
3 | 1 | 2007-12-22 18:00:00 | PST8PDT | 2007-12-22 21:30:00 | PST8PDT | none
4 | 1 | 2007-12-23 22:30:00 | PST8PDT | 2007-12-23 22:45:00 | PST8PDT | daily
5 | 1 | 2007-12-24 06:00:00 | PST8PDT | 2007-12-24 06:30:00 | PST8PDT | weekly
6 | 1 | 2007-12-25 10:00:00 | PST8PDT | 2007-12-25 12:00:00 | PST8PDT | monthly
7 | 1 | 2007-12-26 14:00:00 | PST8PDT | 2007-12-26 15:00:00 | PST8PDT | none
8 | 1 | 2007-12-27 18:00:00 | PST8PDT | 2007-12-27 18:45:00 | PST8PDT | daily
9 | 1 | 2007-12-28 22:30:00 | PST8PDT | 2007-12-29 02:00:00 | PST8PDT | weekly
10 | 1 | 2007-12-29 06:00:00 | PST8PDT | 2007-12-29 06:15:00 | PST8PDT | monthly
(10 rows)
Now let’s get to the recurring date function:
CREATE OR REPLACE FUNCTION generate_recurrences(
recurs RECURRENCE,
start_date DATE,
end_date DATE
)
RETURNS setof DATE
LANGUAGE plpgsql IMMUTABLE
AS $BODY$
DECLARE
next_date DATE := start_date;
duration INTERVAL;
day INTERVAL;
check TEXT;
BEGIN
IF recurs = 'none' THEN
-- Only one date ever.
RETURN next next_date;
ELSIF recurs = 'weekly' THEN
duration := '7 days'::interval;
WHILE next_date <= end_date LOOP
RETURN NEXT next_date;
next_date := next_date + duration;
END LOOP;
ELSIF recurs = 'daily' THEN
duration := '1 day'::interval;
WHILE next_date <= end_date LOOP
RETURN NEXT next_date;
next_date := next_date + duration;
END LOOP;
ELSIF recurs = 'monthly' THEN
duration := '27 days'::interval;
day := '1 day'::interval;
check := to_char(start_date, 'DD');
WHILE next_date <= end_date LOOP
RETURN NEXT next_date;
next_date := next_date + duration;
WHILE to_char(next_date, 'DD') <> check LOOP
next_date := next_date + day;
END LOOP;
END LOOP;
ELSE
-- Someone needs to update this function, methinks.
RAISE EXCEPTION 'Recurrence % not supported by generate_recurrences()', recurs;
END IF;
END;
$BODY$;
The idea here is pretty simple: pass in a recurrence (daily
,
weekly
, or monthly
), a start date, and an end date, and get
back a set of all the recurrence dates between the start and end dates:
try=# \timing
Timing is on.
try=# select * from generate_recurrences('daily', '2008-01-29', '2008-02-05');
generate_recurrences
----------------------
2008-01-29
2008-01-30
2008-01-31
2008-02-01
2008-02-02
2008-02-03
2008-02-04
2008-02-05
(8 rows)
Time: 0.548 ms
try=# select * from generate_recurrences('weekly', '2008-01-29', '2008-03-05');
generate_recurrences
----------------------
2008-01-29
2008-02-05
2008-02-12
2008-02-19
2008-02-26
2008-03-04
(6 rows)
Time: 0.670 ms
try=# select * from generate_recurrences('monthly', '2008-01-29', '2008-05-05');
generate_recurrences
----------------------
2008-01-29
2008-02-29
2008-03-29
2008-04-29
(4 rows)
Time: 0.644 ms
Not bad, eh? And PostgreSQL’s date and interval calculation operators are wicked fast. Check out how long it dates to generate two years worth of daily recurrence dates:
try=# select * from generate_recurrences('daily', '2008-01-29', '2010-02-05');
generate_recurrences
----------------------
2008-01-29
2008-01-30
2008-01-31
...
2010-02-03
2010-02-04
2010-02-05
(739 rows)
Time: 4.982 ms
Awesome. And the great thing about this function is that any time I need to add new recurrences (yearly, biweekly, quarterly, weekends, weekdays, etc.), I just modify the domain and this function and we’re ready to go.
And now, part two: the recurring event function:
CREATE OR REPLACE FUNCTION recurring_events_for(
for_user_id INTEGER,
range_start TIMESTAMP,
range_end TIMESTAMP
)
RETURNS SETOF events
LANGUAGE plpgsql STABLE
AS $BODY$
DECLARE
event events;
start_date TIMESTAMPTZ;
start_time TEXT;
ends_at TIMESTAMPTZ;
next_date DATE;
recurs_at TIMESTAMPTZ;
BEGIN
FOR event IN
SELECT *
FROM events
WHERE user_id = for_user_id
AND (
recurrence <> 'none'
OR (
recurrence = 'none'
AND starts_at BETWEEN range_start AND range_end
)
)
LOOP
IF event.recurrence = 'none' THEN
RETURN NEXT event;
CONTINUE;
END IF;
start_date := event.starts_at::timestamptz AT TIME ZONE event.start_tz;
start_time := start_date::time::text;
ends_at := event.ends_at::timestamptz AT TIME ZONE event.end_tz;
FOR next_date IN
SELECT *
FROM generate_recurrences(
event.recurrence,
start_date::date,
(range_end AT TIME ZONE event.start_tz)::date
)
LOOP
recurs_at := (next_date || ' ' || start_time)::timestamp
AT TIME ZONE event.start_tz;
EXIT WHEN recurs_at > range_end;
CONTINUE WHEN recurs_at < range_start AND ends_at < range_start;
event.starts_at := recurs_at;
event.ends_at := ends_at;
RETURN NEXT event;
END LOOP;
END LOOP;
RETURN;
END;
$BODY$;
The idea here is to select the appropriate events for a given user between
two dates, and for each event iterate over all of the recurrences between the
two dates and return a row for each one. So the lines starting with FOR
event IN and ending with LOOP select the original events,
looking for either recurring events or non-recurring events that are between
the two dates. Note that if you needed to, you could easily refine this query
for your particular application, or even use
PL/pgSQL’s EXECUTE
operator to dynamically generate queries to suit particular application
needs.
Next, the block starting with IF event.recurrence =
'none' THEN simply returns any non-recurring events.
Although the next block already handles this case, adding this optimization
eliminates a fair bit of calculation for the common case of non-recurring
events.
Then the lines starting with FOR next_date IN and ending with
LOOP select all of the dates for the recurrence in question,
using the generate_recurrences() function created earlier. From
LOOP to END LOOP;, the function generates the start
and end timestamps, exiting the loop when the start date falls after the range
or when it falls before the range and the end date falls after the range.
There are many other tweaks one could make here to modify which recurrences
are included and which are excluded. For example, if you had a column in the
events table such as exclude_dates TIMESTAMP[] NOT NULL
DEFAULT ‘{}’ that stored an array of dates to ignore when generating
recurrences, you could just add this line to go ahead and exclude them from
the results returned by the function:
CONTINUE WHEN recurs_at = ANY( exclude_dates );
But enough of the details: let’s see how it works! Here’s a query for a week’s worth of data:
try=# select * from recurring_events_for(1, '2007-12-19', '2007-12-26');
id | user_id | starts_at | start_tz | ends_at | end_tz | recurrence
------+---------+---------------------+----------+---------------------+---------+------------
1 | 1 | 2007-12-20 10:00:00 | PST8PDT | 2007-12-20 03:00:00 | PST8PDT | weekly
2 | 1 | 2007-12-21 14:00:00 | PST8PDT | 2007-12-21 06:45:00 | PST8PDT | monthly
3 | 1 | 2007-12-22 18:00:00 | PST8PDT | 2007-12-22 21:30:00 | PST8PDT | none
4 | 1 | 2007-12-23 22:30:00 | PST8PDT | 2007-12-23 14:45:00 | PST8PDT | daily
4 | 1 | 2007-12-24 22:30:00 | PST8PDT | 2007-12-23 14:45:00 | PST8PDT | daily
4 | 1 | 2007-12-25 22:30:00 | PST8PDT | 2007-12-23 14:45:00 | PST8PDT | daily
5 | 1 | 2007-12-24 06:00:00 | PST8PDT | 2007-12-23 22:30:00 | PST8PDT | weekly
6 | 1 | 2007-12-25 10:00:00 | PST8PDT | 2007-12-25 04:00:00 | PST8PDT | monthly
(8 rows)
Time: 51.890 ms
Note the time it took to execute this query. 52 ms is a hell of a
lot faster than the several minutes it took to run a similar query using the
old view. Plus, I’m not limited to just the recurrence dates I’ve
pre-calculated in the old recurrence_dates table. Now we can use
whatever dates are supported by PostgreSQL. It’s even fast when we look at a
year’s worth of data:
try=# select * from recurring_events_for(1, '2007-12-19', '2008-12-19');
id | user_id | starts_at | start_tz | ends_at | end_tz | recurrence
------+---------+---------------------+----------+---------------------+---------+------------
1 | 1 | 2007-12-20 10:00:00 | PST8PDT | 2007-12-20 03:00:00 | PST8PDT | weekly
1 | 1 | 2007-12-27 10:00:00 | PST8PDT | 2007-12-20 03:00:00 | PST8PDT | weekly
1 | 1 | 2008-01-03 10:00:00 | PST8PDT | 2007-12-20 03:00:00 | PST8PDT | weekly
1 | 1 | 2008-01-10 10:00:00 | PST8PDT | 2007-12-20 03:00:00 | PST8PDT | weekly
1 | 1 | 2008-01-17 10:00:00 | PST8PDT | 2007-12-20 03:00:00 | PST8PDT | weekly
1 | 1 | 2008-01-24 10:00:00 | PST8PDT | 2007-12-20 03:00:00 | PST8PDT | weekly
1 | 1 | 2008-01-31 10:00:00 | PST8PDT | 2007-12-20 03:00:00 | PST8PDT | weekly
...
364 | 1 | 2008-12-17 22:30:00 | PST8PDT | 2008-12-17 14:45:00 | PST8PDT | daily
364 | 1 | 2008-12-18 22:30:00 | PST8PDT | 2008-12-17 14:45:00 | PST8PDT | daily
365 | 1 | 2008-12-18 06:00:00 | PST8PDT | 2008-12-17 22:30:00 | PST8PDT | weekly
(19691 rows)
Time: 837.759 ms
Not stellar, but still respectable. Given that for a typical application, a user will be looking at only a day’s or a week’s or a month’s events at a time, this seems to be an acceptable trade-off. I mean, how often will your users need to see a list of 20,000 events? And even if a user waslooking at a year’s worth of data, it’s unlikely that 75% of them would be recurring as in the example data here.
I was fucking pumped with this solution, and Sandy has hummed along nicely since we put it into production. If you’re interested in trying it for yourself, I’ve you can get all the SQL from this blog entry here.
The only thing I would like to have been able to do differently was to
encapsulate the recurring_events_for() function in a view. Such
would have made it much easier to actually use this solution in
Rails. If you know how to do that, please do leave a comment. As for how I
hacked Rails to use the function, well, that’s a blog post for another
day.













Comments & Trackbacks
rjbs wrote:
Simon wrote:
bangaru wrote:
Theory wrote:
Dan Barry wrote:
Discussion is now closed.