What normalization means
So far we have been making design decisions by instinct. “The author’s name should not be repeated in every book row.” “Tags should not be stored as comma-separated strings.” “Profile data that is optional can live in its own table.” Those instincts are all correct, and they have a formal name: normalization.
Normalization is the process of organizing a schema to eliminate redundancy. The core idea is simple. Each piece of information should be stored exactly once. When you update a fact, like an author’s bio or a product’s price, you update it in one place and every query that reads that fact sees the new value automatically.
This lesson is a bit more conceptual than the last few. We are not going to add new tables through migrations here. Instead, we are going to look at examples that either satisfy or violate each rule, and build a mental framework you can use when designing new migrations in the future.
There are several “normal forms” that describe increasingly strict rules. In practice, you really only need to know the first three.
First Normal Form (1NF)
Rule: Every column holds a single value. No lists, no comma-separated strings, no JSON arrays shoved into a TEXT column.
-- Violates 1NF: tags is a list
CREATE TABLE books (
id INTEGER PRIMARY KEY,
title TEXT NOT NULL,
tags TEXT -- "fiction,classic,american"
);
-- Satisfies 1NF: each tag is a separate row
CREATE TABLE book_tags (
book_id INTEGER NOT NULL,
tag_id INTEGER NOT NULL,
PRIMARY KEY (book_id, tag_id)
); The comma-separated tags column violates 1NF. You cannot query individual tags, index them, or enforce that a tag is valid. The junction table from our earlier migration stores each tag relationship as a separate row, and each column holds exactly one value. That is 1NF in action.
How to check: Look at every column in every table. Does any column contain multiple values packed into one string (or JSON blob, or array-like representation)? If yes, split it into a separate table. In a real codebase, that split would itself be a migration.
Second Normal Form (2NF)
Rule: Every non-key column depends on the entire primary key, not just part of it. This only matters for tables that have a composite primary key (two or more columns forming the key together).
-- Violates 2NF: tag_name depends only on tag_id, not on (book_id, tag_id)
CREATE TABLE book_tags (
book_id INTEGER NOT NULL,
tag_id INTEGER NOT NULL,
tag_name TEXT NOT NULL, -- depends only on tag_id
PRIMARY KEY (book_id, tag_id)
);
-- Satisfies 2NF: tag_name moved to its own table
CREATE TABLE tags (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL
);
CREATE TABLE book_tags (
book_id INTEGER NOT NULL,
tag_id INTEGER NOT NULL,
PRIMARY KEY (book_id, tag_id)
); tag_name depends on tag_id alone, not on the full composite key (book_id, tag_id). If the same tag appears on 100 books, tag_name ends up duplicated 100 times. Moving it to the tags table stores it once, and the junction table goes back to doing just one job: linking books and tags.
How to check: For tables with composite keys, ask yourself whether any column depends on only part of the key. If yes, move that column to the table that the partial key references.
Third Normal Form (3NF)
Rule: Every non-key column depends on the primary key and nothing else. No column should depend on another non-key column.
-- Violates 3NF: author_name depends on author_id, not on book id
CREATE TABLE books (
id INTEGER PRIMARY KEY,
title TEXT NOT NULL,
author_id INTEGER NOT NULL,
author_name TEXT NOT NULL -- depends on author_id, not on book id
);
-- Satisfies 3NF: author_name is in the authors table
CREATE TABLE authors (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL
);
CREATE TABLE books (
id INTEGER PRIMARY KEY,
title TEXT NOT NULL,
author_id INTEGER NOT NULL,
FOREIGN KEY (author_id) REFERENCES authors(id)
); author_name depends on author_id, which is not the primary key of books. If the author changes their name, you would need to update every single book row that references that author. Moving author_name to the authors table means you update it in one place and every query that joins to authors sees the new name automatically.
How to check: For each non-key column, ask: “Does this value change independently of the primary key?” If author_name can change without the book’s ID changing, it belongs in another table.
The practical summary
1NF: No lists in columns. One value per cell.
2NF: No partial dependencies on composite keys.
3NF: No dependencies between non-key columns.
In practice, most design errors violate 1NF (comma-separated values) or 3NF (duplicating data from related tables). If your tables have clear primary keys, proper foreign keys, and no repeated data, you are probably already in 3NF without thinking about it. That is the good news. The discipline of writing a migration for every schema change tends to naturally push you toward these forms, because a poorly normalized schema becomes painful fast.
Higher normal forms exist (BCNF, 4NF, 5NF) but they are rarely relevant in everyday application development. Getting to 3NF handles around 95% of real-world schemas.
Exercises
Exercise 1: Look at this table and identify which normal form it violates:
CREATE TABLE orders (
id INTEGER PRIMARY KEY,
customer_name TEXT,
customer_email TEXT,
product_name TEXT,
product_price REAL,
quantity INTEGER
); (Customer data depends on the customer, product data depends on the product, not on the order. This violates 3NF.)
Exercise 2: Normalize the orders table into 3NF. How many tables do you need, and what columns go in each? If you were adding this to a real app, which migrations would you write and in what order?
Exercise 3: Find a comma-separated column in a real app’s database or a tutorial. Redesign it to satisfy 1NF.
What does Third Normal Form (3NF) prevent?
Normalization eliminates redundancy, which is great. But it has a cost: reading data often requires joining multiple tables, and joins are not free. In the next lesson we will look at when it actually makes sense to break normalization rules on purpose, and we will ship that change through another migration.