Many-to-many

In the last lesson, we added authors and books tables and connected them through a one-to-many relationship. One author can have many books, and each book points back to exactly one author. That worked because the “many” side had a single slot for a foreign key.

Now let’s think about tagging. A book can have many tags: fiction, classic, American. And a tag can belong to many books. “Fiction” applies to thousands of books. This is a many-to-many relationship, and neither side is the “one.”

You cannot represent this with a single foreign key. A tag_id column on books only allows one tag per book. A book_ids column on tags is the comma-separated string problem all over again. So what do you do?

The junction table

The solution is a third table that sits between the other two. It is called a junction table (also known as a join table, bridge table, or associative table). Each row in the junction table represents one single connection between a book and a tag.

Here is the shape we want. We will ship it as a migration in a moment, but first let’s look at the SQL on its own:

CREATE TABLE tags (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  name TEXT NOT NULL UNIQUE
);

CREATE TABLE book_tags (
  book_id INTEGER NOT NULL,
  tag_id INTEGER NOT NULL,
  PRIMARY KEY (book_id, tag_id),
  FOREIGN KEY (book_id) REFERENCES books(id) ON DELETE CASCADE,
  FOREIGN KEY (tag_id) REFERENCES tags(id) ON DELETE CASCADE
);

Each row in book_tags represents one connection: “this book has this tag.” The composite primary key (book_id, tag_id) ensures the same book-tag pair cannot exist twice. A book can appear in many rows (paired with different tags), and a tag can appear in many rows (paired with different books). The shape of the data is now exactly right.

We also added ON DELETE CASCADE to both foreign keys. That means if a book or a tag is deleted, the relevant rows in book_tags get cleaned up automatically. No orphaned connections.

Writing migration 004

Both tables are part of the same feature, so it makes sense to add them in one migration. The order inside the file matters: tags has to exist before book_tags can reference it. The books table already exists from migration 003, so the foreign key has something valid to point at.

Here is our new migration file:

Code along

-- migrations/004_create_tags_and_book_tags.sql
BEGIN;

CREATE TABLE tags (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  name TEXT NOT NULL UNIQUE
);

CREATE TABLE book_tags (
  book_id INTEGER NOT NULL,
  tag_id INTEGER NOT NULL,
  PRIMARY KEY (book_id, tag_id),
  FOREIGN KEY (book_id) REFERENCES books(id) ON DELETE CASCADE,
  FOREIGN KEY (tag_id) REFERENCES tags(id) ON DELETE CASCADE
);

INSERT INTO schema_migrations (version, name)
VALUES ('004', 'create_tags_and_book_tags');

COMMIT;

Two tables, one migration, one atomic unit of change. If either CREATE TABLE fails, the whole transaction rolls back, and the schema_migrations row is never inserted. The next time the runner kicks off, it will try again from a clean slate.

Inserting relationships

With the migration applied, let’s add some real data. We already have books from the one-to-many lesson, but no tags yet, so let’s start by adding three. A single multi-row INSERT is already atomic on its own, so no transaction wrapper is needed here:

Code along

INSERT INTO tags (name) VALUES
  ('fiction'),
  ('classic'),
  ('fantasy');

That gives us tag ids 1 (fiction), 2 (classic), and 3 (fantasy), assigned in insert order by INTEGER PRIMARY KEY AUTOINCREMENT.

Now we can connect books to tags. Tagging a book with multiple tags is the kind of “several writes that belong together” we keep coming back to: either the book ends up with all of its tags or we do not want it half-tagged. Wrap the pair in a transaction:

Code along

-- Tag "The Old Man and the Sea" (book 1) as fiction and classic
BEGIN;
  INSERT INTO book_tags (book_id, tag_id) VALUES (1, 1);
  INSERT INTO book_tags (book_id, tag_id) VALUES (1, 2);
COMMIT;

-- Tag "A Wizard of Earthsea" (book 3) as fiction and fantasy
BEGIN;
  INSERT INTO book_tags (book_id, tag_id) VALUES (3, 1);
  INSERT INTO book_tags (book_id, tag_id) VALUES (3, 3);
COMMIT;

Each INSERT creates exactly one connection. To give a book three tags, you insert three rows. That might feel verbose compared to a comma-separated string, but it gives you something the string could never offer: queryable, indexable, enforceable relationships.

Querying many-to-many

Now we can ask questions that would have been painful with a comma-separated column:

Code along

-- All tags for a specific book
SELECT tags.name
FROM tags
JOIN book_tags ON tags.id = book_tags.tag_id
WHERE book_tags.book_id = 1;
-- Result: fiction, classic

-- All books with a specific tag
SELECT books.title
FROM books
JOIN book_tags ON books.id = book_tags.book_id
WHERE book_tags.tag_id = 1;
-- Result: The Old Man and the Sea, A Wizard of Earthsea

-- Count of books per tag
SELECT tags.name, COUNT(book_tags.book_id) AS book_count
FROM tags
LEFT JOIN book_tags ON tags.id = book_tags.tag_id
GROUP BY tags.id
ORDER BY book_count DESC;

The pattern is always the same: start from one side, JOIN through the junction table, reach the other side. The junction table is the bridge. Once you internalize that pattern, many-to-many stops feeling tricky.

Junction tables with extra data

Sometimes the relationship itself has attributes. A user enrolls in a course. The enrollment has a date and a progress percentage. That data does not really belong to the user, and it does not really belong to the course. It belongs to the relationship between them.

You model that the same way as a junction table, but you add extra columns:

CREATE TABLE enrollments (
  user_id INTEGER NOT NULL,
  course_id INTEGER NOT NULL,
  enrolled_at TEXT NOT NULL DEFAULT (datetime('now')),
  progress INTEGER NOT NULL DEFAULT 0,
  PRIMARY KEY (user_id, course_id),
  FOREIGN KEY (user_id) REFERENCES users(id),
  FOREIGN KEY (course_id) REFERENCES courses(id)
);

If you added this to your real database, it would live in its own migration file, numbered with whatever came next in your sequence, using the same BEGIN / COMMIT / INSERT INTO schema_migrations pattern we have been using.

The junction table stores the relationship AND the relationship’s data: when the user enrolled and how far along they are. This is still many-to-many (one user can enroll in many courses, one course can have many users), but each connection carries its own information.

Removing relationships

These three demos are destructive, and later lessons rely on book 1 still being around with its tags. So we use the same wrap-and-verify pattern from the updating-and-deleting lesson, but with ROLLBACK instead of COMMIT at the end. Each demo shows you what would happen, then puts the data back exactly the way it was.

Code along

-- Remove a single tag from a book
BEGIN;
DELETE FROM book_tags WHERE book_id = 1 AND tag_id = 2;
SELECT * FROM book_tags WHERE book_id = 1;   -- only the (1, 1) row remains
ROLLBACK;

Code along

-- Remove all tags from a book
BEGIN;
DELETE FROM book_tags WHERE book_id = 1;
SELECT * FROM book_tags WHERE book_id = 1;   -- nothing left
ROLLBACK;

Code along

-- Deleting the book itself cascades into book_tags via ON DELETE CASCADE
BEGIN;
DELETE FROM books WHERE id = 1;
SELECT * FROM books WHERE id = 1;            -- gone
SELECT * FROM book_tags WHERE book_id = 1;   -- also gone, removed by the cascade
ROLLBACK;

After each ROLLBACK, run the same SELECTs again outside any transaction and confirm everything is back where it started.

The takeaway: deleting a row from the junction table removes the connection without affecting either side. The book still exists, the tag still exists, they are just no longer linked. And if you delete the parent row itself, the cascade rule on the foreign key cleans up the junction rows for you. That is exactly what you want for a relationship.

Exercises

Exercise 1: Write migration 004_create_tags_and_book_tags.sql and run it. Insert a few books and tags, then connect them via the junction table.

Exercise 2: Query all tags for a specific book. Then query all books for a specific tag.

Exercise 3: Try inserting the same book-tag pair twice. Verify the composite primary key prevents it.

Exercise 4: Write a new migration for a reading_list junction table that connects users and books with extra data: added_at (when the user added the book) and status (one of 'to_read', 'reading', or 'finished'). Apply it, then add a few rows for user 1 against several books and query “all books user 1 is currently reading.” Wrap the multi-row inserts in a transaction.

Why does a many-to-many relationship need a junction table?

We have covered one-to-many and many-to-many. There is one more relationship type to know about: one-to-one. It is less common, but it shows up in very specific situations, and in the next lesson we will ship it as the next migration in our schema.

Access Required

Many-to-many

The junction table

Writing migration 004

Inserting relationships

Querying many-to-many

Junction tables with extra data

Removing relationships

Exercises