When to denormalize

A fully normalized schema stores each fact exactly once. That is clean and correct. But reading that data often means joining multiple tables. “Show a book with its author, tags, and review count” can easily touch four tables in a single query. For read-heavy workloads, those joins add up, and for lists of thousands of books, they can add up a lot.

Denormalization is the deliberate decision to duplicate data in exchange for faster reads. The keyword is “deliberate.” You are choosing to break a normalization rule because you measured a performance problem and this is the right fix for your situation.

[!WARNING] Normalize first. Denormalize only when you have a measured performance problem. Premature denormalization creates data inconsistency bugs that are hard to track down, and those bugs tend to be silent.

Adding reviews to the schema

We have been talking about “a book with reviews” for a while without a reviews table to back it up. Before we can demonstrate denormalization, we need that table. Our platform lets a user rate a book between 1 and 5 and leave a short body of text. That is a classic one-to-many-from-both-sides setup: a user has many reviews and a book has many reviews, so reviews carries foreign keys to both.

Ship it as migration 006:

-- migrations/006_create_reviews.sql
BEGIN;

CREATE TABLE reviews (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  user_id INTEGER NOT NULL,
  book_id INTEGER NOT NULL,
  rating INTEGER NOT NULL CHECK (rating BETWEEN 1 AND 5),
  body TEXT,
  created_at TEXT NOT NULL DEFAULT (datetime('now')),
  FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE,
  FOREIGN KEY (book_id) REFERENCES books(id) ON DELETE CASCADE,
  UNIQUE (user_id, book_id)
);

INSERT INTO schema_migrations (version, name)
VALUES ('006', 'create_reviews');

COMMIT;

A few design decisions worth calling out. The CHECK constraint on rating encodes the “1 to 5 stars” rule directly in the schema, so no bad rating can sneak in regardless of what your application code does. The UNIQUE (user_id, book_id) pair says “one review per user per book,” which is exactly the rule most review systems want. And ON DELETE CASCADE on both foreign keys means that deleting a user or a book automatically tidies up their reviews.

With that migration applied, we can talk about the interesting part: what happens once a popular book has a lot of reviews.

Counting queries

“How many reviews does this book have?” In a normalized schema, you count every time:

Code along

SELECT COUNT(*) FROM reviews WHERE book_id = ?;

This scans all matching rows. With millions of reviews, it gets slow. And if you show the review count on every book in a list of 50 books, you run this query 50 times per page load. Now the cost is real.

The denormalized approach: add a review_count column directly to the books table so you can read it in a single column lookup instead of a count. Since this is a schema change, we ship it as a migration.

-- migrations/007_add_review_count_to_books.sql
BEGIN;

ALTER TABLE books ADD COLUMN review_count INTEGER NOT NULL DEFAULT 0;

INSERT INTO schema_migrations (version, name)
VALUES ('007', 'add_review_count_to_books');

COMMIT;

A few things worth noticing. We use ALTER TABLE rather than creating a new table, because books already exists. The DEFAULT 0 is important: it means every existing book row gets a valid count immediately when the migration runs. Without the default, older rows would get NULL, and our counting logic would have to handle that case. And of course the migration is wrapped in a transaction and records itself in schema_migrations, same as all the others.

Now that the column exists, we just have to keep it up to date whenever reviews change. The review insert and the count update only make sense together, so we wrap the pair in a transaction:

Code along

-- Adding a review
BEGIN;
  INSERT INTO reviews (user_id, book_id, rating, body) VALUES (?, ?, ?, ?);
  UPDATE books SET review_count = review_count + 1 WHERE id = ?;
COMMIT;

-- Removing a review
BEGIN;
  DELETE FROM reviews WHERE id = ?;
  UPDATE books SET review_count = review_count - 1 WHERE id = ?;
COMMIT;

If either statement fails, neither one sticks and the count never drifts away from the real data. This is exactly the kind of multi-step write the transactions lesson was about.

“How many reviews?” is now a single column read. No counting, no join, no scanning.

Computed fields

“What is the average rating for this book?” Computing the average means scanning all of a book’s reviews every time. The same trick applies. Store it, then keep it in sync. Since this is another schema change, it gets its own migration. You could roll it into the previous migration if you planned ahead, but if you are adding it later, a separate migration is fine and even preferable because it documents the change clearly:

-- migrations/008_add_avg_rating_to_books.sql
BEGIN;

ALTER TABLE books ADD COLUMN avg_rating REAL;

INSERT INTO schema_migrations (version, name)
VALUES ('008', 'add_avg_rating_to_books');

COMMIT;

Then recompute whenever a review is added or removed, inside the same transaction that touches the review itself:

Code along

BEGIN;
  INSERT INTO reviews (user_id, book_id, rating, body) VALUES (?, ?, ?, ?);
  UPDATE books SET
    review_count = review_count + 1,
    avg_rating = (SELECT AVG(rating) FROM reviews WHERE book_id = ?)
  WHERE id = ?;
COMMIT;

Three writes, one atomic unit. The review is inserted, the count goes up, and the average is recomputed from the fresh data, all or nothing.

Display names

“Show the order with the customer’s name.” Instead of joining the customers table on every order listing, you could store the customer’s name directly on the order:

CREATE TABLE orders (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  customer_id INTEGER NOT NULL,
  customer_name TEXT NOT NULL,  -- denormalized from customers table
  total REAL NOT NULL,
  FOREIGN KEY (customer_id) REFERENCES customers(id)
);

This duplicates the name, but order listings no longer need a join. The customer_id foreign key is still there for when you need the full customer record. You still have the truth in the customers table, you just keep a cached copy where it is convenient.

The tradeoff

	Normalized	Denormalized
Reads	Slower (joins)	Faster (single table)
Writes	Faster (update once)	Slower (update multiple places)
Consistency	Guaranteed	Must be maintained manually
Storage	Less	More

Denormalization trades write complexity for read speed. The catch is that you must keep the denormalized data in sync. If you forget to update review_count when a review is deleted, the count is wrong. And that kind of bug is silent. Nothing crashes. The data is just quietly incorrect, and you probably will not notice until a user does.

Keeping denormalized data in sync

There are three approaches, from least reliable to most:

Application code. Every place that inserts, updates, or deletes related data must also update the denormalized column. This is error-prone. Miss one code path and the data drifts.

Database triggers. SQLite triggers run automatically when data changes. Since triggers are part of the schema, we add them through a migration too. Here is what that looks like:

-- migrations/009_add_review_count_triggers.sql
BEGIN;

CREATE TRIGGER update_review_count_on_insert
AFTER INSERT ON reviews
BEGIN
  UPDATE books SET review_count = review_count + 1 WHERE id = NEW.book_id;
END;

CREATE TRIGGER update_review_count_on_delete
AFTER DELETE ON reviews
BEGIN
  UPDATE books SET review_count = review_count - 1 WHERE id = OLD.book_id;
END;

INSERT INTO schema_migrations (version, name)
VALUES ('009', 'add_review_count_triggers');

COMMIT;

Triggers are more reliable than application code because they always run, regardless of which code path actually inserted or deleted the review. But they are also harder to debug and test. If something seems weirdly automatic, a forgotten trigger is often the culprit.

Periodic recomputation. Run a query that recalculates denormalized values from the source data:

Code along

UPDATE books SET review_count = (
  SELECT COUNT(*) FROM reviews WHERE book_id = books.id
);

This corrects any drift. Run it hourly or daily as a safety net, even if you also use triggers. Belt and suspenders.

Exercises

Exercise 1: Write migration 006_create_reviews.sql and apply it. Insert at least 5 reviews across two or three books, then write migration 007_add_review_count_to_books.sql and update the count manually for each book. Query the counts. Compare to running COUNT(*) FROM reviews WHERE book_id = ? each time.

Exercise 2: Write migration 009_add_review_count_triggers.sql with triggers that update review_count on insert and delete. Apply it. Add and remove reviews. Verify the count stays accurate without any manual updates from your application code.

Exercise 3: Intentionally create a count mismatch (insert a review directly while triggers are disabled, or edit the count manually). Write the recomputation query to fix it.

What is the biggest risk of denormalization?

At this point we have a solid schema: users, authors, books, tags, profiles, reviews, and denormalized review counts, all shipped through proper migrations. We know how to structure data, and we know when to bend the rules. The next section is all about getting data back out. We will dig deep into querying across tables with JOINs, aggregation, and more advanced patterns.

Access Required

When to denormalize

Adding reviews to the schema

Counting queries

Computed fields

Display names

The tradeoff

Keeping denormalized data in sync

Exercises