Indexes

So far we have shaped our schema, related tables to each other, and moved every change through migration files. The queries we write against that schema all work. As the tables stay small, they feel instant. That feeling is a trap. The moment our books table grows past a few thousand rows, the same queries start to drag, and it is not obvious why. This lesson is about the reason, and about the one tool that fixes it: the index.

Picture a books table with 100,000 rows. We run a simple filter:

Code along

SELECT * FROM books WHERE genre = 'fiction';

SQLite starts at the first row, checks the genre, moves to the second row, checks the genre, and keeps going until it has looked at all 100,000 rows. This is called a full table scan. It works, but it gets slower as the table grows. With 100,000 rows it might take a few milliseconds. With 10 million rows it could take seconds. And every single query pays that cost, even if only 50 rows match.

An index is a separate data structure that lets the database find rows without scanning the entire table. Think of the index at the back of a textbook. Instead of reading every page to find a topic, you look it up in the index, get a page number, and jump directly there. A database index works the same way. It stores the values of one or more columns in a sorted structure so that lookups, joins, and sorts can skip straight to the relevant rows.

Creating an index

The CREATE INDEX statement builds an index on one or more columns:

CREATE INDEX idx_books_genre ON books(genre);

Now when we run WHERE genre = 'fiction', SQLite consults the index first. It finds all the rows where genre equals 'fiction' and reads only those rows from the table. Instead of scanning 100,000 rows, it might read just 50.

Since every schema change in this course goes through a migration file, indexes are no exception. Creating an index is a schema change, just like adding a column. So when we want this index in the real database, we write it as a migration:

-- migrations/010_add_books_genre_index.sql
BEGIN;

CREATE INDEX idx_books_genre ON books(genre);

INSERT INTO schema_migrations (version, name)
VALUES ('010', 'add_books_genre_index');

COMMIT;

The naming convention idx_table_column is not required, but it is widely used. It tells you at a glance which table and column the index belongs to. When you have dozens of indexes across a schema, this naming convention saves you from guessing.

Unique indexes

If a column should only contain unique values, you can create a unique index:

CREATE UNIQUE INDEX idx_tags_name ON tags(name);

This does two things at once. It creates an index (so lookups by that column are fast) and it enforces a uniqueness constraint (so no two rows can have the same value). If you try to insert a duplicate, SQLite rejects the insert.

We are not actually going to add this as a migration, because tags.name was declared UNIQUE back in migration 004. That is the detail worth internalizing: any column already declared UNIQUE (or PRIMARY KEY) gets an index for free. SQLite creates one behind the scenes to enforce the constraint, so writing CREATE UNIQUE INDEX on top of it would be redundant. You only reach for CREATE UNIQUE INDEX when you want to add uniqueness to a column that did not originally have it, typically in a later migration that tightens a constraint.

When to add an index

You do not need to index every column from the start. In fact, you should not. Start without indexes and add them when you have a reason. Here are the situations where an index makes a real difference.

Columns you filter by frequently. If your application runs WHERE genre = ? on every page load, an index on genre saves a full table scan every time.

CREATE INDEX idx_books_genre ON books(genre);

Columns you join on. When you join books to authors using author_id, SQLite needs to find matching rows in both tables. An index on the foreign key column makes the join fast.

CREATE INDEX idx_books_author_id ON books(author_id);

Columns you sort by. If you frequently sort by created_at, an index lets SQLite read the rows in order instead of sorting them after the fact.

CREATE INDEX idx_books_created_at ON books(created_at);

Large tables. A table with 100 rows does not need an index. A full table scan of 100 rows is essentially instant. But once a table grows to thousands or millions of rows, the cost of scanning every row adds up quickly.

A good rule of thumb: if a query is slow, a column is used frequently in WHERE clauses or joins, and the table is large, add an index.

When not to add an index

Indexes are not free. Every time you insert, update, or delete a row, SQLite must also update every index on that table. More indexes means slower writes.

Small tables. If the table has only a few hundred rows, SQLite can scan the whole thing faster than it can consult an index. The overhead of maintaining the index is not worth it.

Columns you do not query by. An index on books.description is a waste if you never write WHERE description = ... or ORDER BY description. It slows down writes without helping any reads.

Write-heavy tables. If a table receives thousands of inserts per second and is rarely queried, adding indexes hurts more than it helps. Every insert must update every index, which multiplies the write cost.

The bottom line: index the columns your queries actually use, and leave the rest alone.

Composite indexes

Sometimes a query filters on more than one column:

Code along

SELECT * FROM books
WHERE genre = 'fiction'
AND published_at > '2020-01-01';

A single-column index on genre helps, but the database still has to scan all fiction books to check the date. A composite index covers multiple columns and handles both conditions efficiently. Here it is as a migration:

-- migrations/011_add_books_genre_published_index.sql
BEGIN;

CREATE INDEX idx_books_genre_published ON books(genre, published_at);

INSERT INTO schema_migrations (version, name)
VALUES ('011', 'add_books_genre_published_index');

COMMIT;

Column order matters

The order of columns in a composite index is critical. Think of a phone book. A phone book is sorted by last name first, then by first name within each last name. You can look up everyone named “Smith” easily, because all the Smiths are grouped together. You can look up “Smith, Alice” easily, because within the Smith section, entries are sorted by first name. But you cannot efficiently look up everyone named “Alice” regardless of last name, because the book is not organized by first name at the top level.

A composite index works the same way. An index on (genre, published_at) is organized by genre first, then by published_at within each genre. This means:

WHERE genre = 'fiction' uses the index. Genre is the first column.
WHERE genre = 'fiction' AND published_at > '2020-01-01' uses the full index. Both columns match left to right.
WHERE published_at > '2020-01-01' does not use this index. The index is sorted by genre first, so filtering by published_at alone would require scanning every genre group.

This is called the left-to-right prefix rule. A composite index can be used for any query that matches a left-to-right prefix of the index columns. If your index is on (a, b, c), it helps queries on a, on a AND b, and on a AND b AND c. It does not help queries on b alone or c alone.

If we are adding several indexes that go together, we can put them in the same migration. Each migration file is already a transaction, and creating multiple indexes at once is a single logical schema change.

Our reviews table from migration 006 is a good candidate. It has two foreign keys (user_id and book_id) and a rating column that the book page will filter and sort by. Foreign keys are not automatically indexed in SQLite, so every JOIN that goes through reviews is doing a full scan until we fix it. We bundle those three indexes into one migration because they all exist for the same reason: making review-related queries fast.

-- migrations/012_add_reviews_indexes.sql
BEGIN;

CREATE INDEX idx_reviews_book_id ON reviews(book_id);
CREATE INDEX idx_reviews_user_id ON reviews(user_id);
CREATE INDEX idx_reviews_rating ON reviews(rating);

INSERT INTO schema_migrations (version, name)
VALUES ('012', 'add_reviews_indexes');

COMMIT;

Three indexes, one migration, one atomic unit of change. If any one of the CREATE INDEX statements fails, the whole thing rolls back and none of them get created.

Creating an index does not change any data. It builds a new data structure alongside the table. This is a safe, non-destructive migration. If you later decide an index is not needed, you can drop it the same way you added it, in its own migration:

-- migrations/013_drop_books_genre_index.sql
BEGIN;

DROP INDEX idx_books_genre;

INSERT INTO schema_migrations (version, name)
VALUES ('013', 'drop_books_genre_index');

COMMIT;

Dropping an index does not affect the data either. The table is untouched. Queries still work. They just go back to scanning instead of using the index.

EXPLAIN QUERY PLAN

We have created an index, but how do we know the database is actually using it? Guessing is not good enough. SQLite has a built-in tool called EXPLAIN QUERY PLAN that tells you exactly how a query will be executed.

Put EXPLAIN QUERY PLAN before any SELECT statement:

Code along

EXPLAIN QUERY PLAN SELECT * FROM books WHERE genre = 'fiction';

Without an index, the output looks like this:

SCAN books

SCAN means SQLite is reading every row in the table. This is the full table scan you want to avoid on large tables.

With an index, the output changes:

SEARCH books USING INDEX idx_books_genre (genre=?)

SEARCH ... USING INDEX means SQLite found an index and is using it to jump directly to the matching rows. This is what you want to see.

For composite indexes, the output shows which columns are being used:

Code along

EXPLAIN QUERY PLAN
SELECT * FROM books
WHERE genre = 'fiction' AND published_at > '2020-01-01';

SEARCH books USING INDEX idx_books_genre_published (genre=? AND published_at>?)

Both columns are listed, which means the full composite index is in use.

Always run EXPLAIN QUERY PLAN before and after adding an index. Do not assume an index is helping just because you created it. The query planner might choose a different path, especially for small tables or for queries that do not match the index columns properly.

Covering indexes

There is one more optimization worth knowing about. Normally, when SQLite uses an index, it first looks up the matching row IDs in the index, then goes to the table to fetch the actual data. That second step, the table lookup, takes time.

A covering index is an index that contains all the columns a query needs. When this happens, SQLite reads everything from the index and never touches the table at all.

-- migrations/014_add_books_genre_title_index.sql
BEGIN;

CREATE INDEX idx_books_genre_title ON books(genre, title);

INSERT INTO schema_migrations (version, name)
VALUES ('014', 'add_books_genre_title_index');

COMMIT;

Now consider this query:

Code along

SELECT title FROM books WHERE genre = 'fiction';

This query only needs genre (for filtering) and title (for the result). Both columns are in the index. SQLite can answer the entire query from the index alone. EXPLAIN QUERY PLAN shows this:

SEARCH books USING COVERING INDEX idx_books_genre_title (genre=?)

The word COVERING in the output tells you no table lookup was needed. This is the fastest type of indexed query.

You do not need to go out of your way to create covering indexes for every query. But when you have a performance-critical query that reads only a few columns, it is worth checking whether the index already covers it or could be extended to cover it.

Exercises

Exercise 1: Create a books table with at least 1,000 rows of sample data (you can use an INSERT with a subquery or a loop). Run EXPLAIN QUERY PLAN SELECT * FROM books WHERE genre = 'fiction' without any index and verify you see SCAN. Then write a migration that creates an index on genre and run the same EXPLAIN again. Verify you see SEARCH USING INDEX.

Exercise 2: Write a migration that creates a composite index on (genre, published_at). Test three queries with EXPLAIN QUERY PLAN: WHERE genre = ?, WHERE published_at = ?, and WHERE genre = ? AND published_at = ?. Which ones use the index? Does the result match what you would expect from the left-to-right prefix rule?

Exercise 3: Write a migration that adds indexes to your existing tables. Think about which columns you filter, join, or sort by. Add only the indexes that your queries actually need.

Exercise 4: Write a migration that creates an index on (genre, title) and run EXPLAIN QUERY PLAN SELECT title FROM books WHERE genre = 'fiction'. Do you see the word COVERING in the output? What happens if you change the query to SELECT * instead of SELECT title?

A composite index is on (a, b, c). Which WHERE clause can use this index?

What does COVERING INDEX mean in EXPLAIN QUERY PLAN output?

Indexes tell you what to build. But how do you know if a query is slow in the first place, or which index is actually missing? In the next lesson, we will use EXPLAIN QUERY PLAN to hunt down the most common performance problems and fix them one by one.

Access Required

Indexes

Creating an index

Unique indexes

When to add an index

When not to add an index

Composite indexes

Column order matters

Bundling related indexes into one migration

EXPLAIN QUERY PLAN

Covering indexes

Exercises