Continuous aggregates: the materialized view that refreshes itself

There's a query pattern that shows up in almost every time-series application and quietly becomes the most expensive thing your database does:

SELECT
  time_bucket('1 hour', ts) AS hour,
  device_id,
  avg(temperature),
  max(temperature)
FROM readings
WHERE ts > now() - interval '7 days'
GROUP BY hour, device_id;

It's a dashboard. It runs on every page load, for every user, and it scans and aggregates millions of raw rows each time to produce a few thousand points. The raw data never changes once written, but you recompute the same averages over and over. This is pure waste, and it's exactly the problem continuous aggregates solve.

Why not a plain materialized view?

The instinct in vanilla PostgreSQL is a materialized view. It works, until it doesn't:

It goes stale. A materialized view is a snapshot. New rows arriving in readings don't show up until you refresh.
The refresh rebuilds everything. REFRESH MATERIALIZED VIEW recomputes the entire result from scratch — all 7 days, every time — even though only the last few minutes of data actually changed. On a large hypertable that's minutes of CPU to incorporate seconds of new data.
Concurrent refresh needs a unique index and still does the full recompute; it just avoids the exclusive lock.

For append-mostly time-series data, recomputing history you've already computed is the entire inefficiency. A continuous aggregate is a materialized view that only does the new work.

What a continuous aggregate actually is

It's a materialized view, backed by its own hypertable, that TimescaleDB keeps up to date incrementally — it only re-materializes the time buckets whose underlying raw data has changed.

CREATE MATERIALIZED VIEW readings_hourly
WITH (timescaledb.continuous) AS
SELECT
  time_bucket('1 hour', ts) AS bucket,
  device_id,
  avg(temperature) AS avg_temp,
  max(temperature) AS max_temp,
  count(*)         AS n
FROM readings
GROUP BY bucket, device_id
WITH NO DATA;

Two things to notice. The WITH (timescaledb.continuous) storage parameter is what makes it continuous rather than a normal materialized view. And WITH NO DATA tells it not to backfill the entire history synchronously at creation time — you don't want a CREATE statement that blocks for an hour churning through years of data. You materialize history on your own terms afterward.

The aggregated buckets live in their own compressed-friendly hypertable. Querying readings_hourly reads a few thousand pre-computed rows instead of scanning millions of raw ones.

The refresh policy — and the mistake almost everyone makes

You don't refresh a continuous aggregate by hand. You attach a policy that runs in the background:

SELECT add_continuous_aggregate_policy('readings_hourly',
  start_offset      => INTERVAL '3 days',
  end_offset        => INTERVAL '1 hour',
  schedule_interval => INTERVAL '30 minutes');

Read those offsets carefully, because this is where most setups go subtly wrong. Every 30 minutes, the policy re-materializes the window from now() - 3 days up to now() - 1 hour. That trailing end_offset is the important part:

start_offset bounds how far back you reach. Anything older than 3 days is assumed settled and is never recomputed — cheap. Make it comfortably larger than your latest-arriving data, but no larger.
end_offset holds the refresh back from the present. Here's why it matters: the most recent bucket is still filling. If you refresh right up to now(), you materialize a half-complete hour, then materialize it again next run, and again — burning work to repeatedly overwrite the same incomplete bucket. A non-zero end_offset (at least one full bucket width) means you only materialize buckets that are closed.

The common mistake is end_offset => INTERVAL '0' (or omitting it), which makes the policy thrash on the current bucket forever. Set it to one bucket width or more.

So what about the data newer than end_offset — the last hour you deliberately didn't materialize? That's where real-time aggregation comes in.

Real-time aggregation: materialized history + live edge

By default (depending on your TimescaleDB version — check it), a continuous aggregate can answer queries in real-time mode: it returns the materialized buckets unioned with an on-the-fly aggregation of the raw rows newer than the materialization watermark. You get pre-computed history and up-to-the-second recent data from a single SELECT, with no staleness at the edge.

You control it explicitly:

-- Real-time: materialized buckets + live aggregation of the newest raw rows
ALTER MATERIALIZED VIEW readings_hourly
  SET (timescaledb.materialized_only = false);-- Materialized-only: strictly the pre-computed buckets, nothing live
ALTER MATERIALIZED VIEW readings_hourly
  SET (timescaledb.materialized_only = true);

Real-time mode is what lets you set a safe, lazy end_offset without the dashboard looking stale — the gap between the watermark and now() is filled by querying raw data directly. The trade-off: that live portion does real work at query time, so if your "recent window" is huge, materialize more aggressively and lean less on the live edge.

Compress and expire — the part that pays the storage bill

Continuous aggregates are hypertables, so the same policies apply. A powerful pattern: keep the rollups, drop the raw data.

-- Compress aggregated chunks older than 7 days
ALTER MATERIALIZED VIEW readings_hourly
  SET (timescaledb.compress = true);
SELECT add_compression_policy('readings_hourly', compress_after => INTERVAL '7 days');-- Keep raw readings for 30 days; the hourly rollup keeps the long-term history
SELECT add_retention_policy('readings', drop_after => INTERVAL '30 days');

Now your raw, high-cardinality data has a short, bounded life, while the aggregates — which are tiny by comparison — carry years of history cheaply. This is usually the real reason teams adopt continuous aggregates: not just query speed, but turning an unbounded raw-data storage problem into a bounded one.

Rollups on rollups

Continuous aggregates can be built on top of other continuous aggregates. Bucket the hourly view into a daily one and the daily view materializes from a few thousand hourly rows instead of millions of raw ones:

CREATE MATERIALIZED VIEW readings_daily
WITH (timescaledb.continuous) AS
SELECT
  time_bucket('1 day', bucket) AS day,
  device_id,
  avg(avg_temp) AS avg_temp,
  max(max_temp) AS max_temp,
  sum(n)        AS n
FROM readings_hourly
GROUP BY day, device_id
WITH NO DATA;

One note on correctness: averaging an average is only valid when the buckets are equal-weight or you carry the counts. That's why n is in the hourly view — for a weighted daily average you'd compute sum(avg_temp * n) / sum(n), not avg(avg_temp). The same caution applies to any non-decomposable aggregate (count(distinct ...), percentiles); for those, reach for TimescaleDB's partial-aggregate / hyperfunction forms rather than naively rolling up.

The gotchas worth knowing up front

Not every expression is allowed. The view's SELECT must be aggregatable and deterministic — no now() in the definition, no non-immutable functions, no window functions. Filter on time at query time, not in the view.
time_bucket must be on the hypertable's time column. That's how TimescaleDB tracks which buckets are invalidated by new or backfilled rows.
Backfilling old data invalidates buckets. Late-arriving rows inside start_offset get picked up on the next refresh; rows older than start_offset won't, unless you refresh that range manually with CALL refresh_continuous_aggregate(...).
Materialize history deliberately. After WITH NO DATA, run one refresh_continuous_aggregate over the historical range you care about — off-peak — rather than letting the first policy run try to swallow everything.

The mental model

A continuous aggregate isn't "a faster materialized view." It's a standing agreement: history is computed once and stored cheaply; the present is computed live; and the boundary between them is a watermark you control with end_offset. Get the offsets right and you stop paying to recompute the past on every dashboard load — which, for time-series workloads, is usually the single biggest query cost you have.