Why schema design matters

At this point, you know a lot about SQLite. You know how to create tables, you understand primary keys and constraints, you built a full migration system back in Section 3, and in the last section you learned how to put data in, get it back out, and wrap multi-step writes in transactions. That means every schema change from here on has a home, and every example that spans more than one write will be wrapped in BEGIN / COMMIT as a matter of habit. When you want to add a table or alter a column, you write a new migration file, wrap it in a transaction, and your database evolves in a controlled, reviewable way.

But there is a question we have not actually answered yet. You know how to create tables. You know how to ship those creations through migrations. The harder question is: which tables should you create in the first place? Which columns belong where? How do you structure things so your queries stay simple and your data stays consistent as the app grows?

Those decisions matter more than most people realize. A well-designed schema makes features easy to build. A poorly designed one makes every query a workaround and every new feature a fight. This section of the course is about learning to make those decisions on purpose.

A bad schema

Let’s say you are building an app that tracks books and their authors. Here is a schema that looks simple at first:

CREATE TABLE books (
  id INTEGER PRIMARY KEY,
  title TEXT,
  author_name TEXT,
  author_email TEXT,
  author_bio TEXT,
  genre TEXT,
  tags TEXT  -- "fiction,classic,american"
);

One table, all the data in one place. Easy to understand. But this design creates problems that get worse the longer you live with it.

Redundancy. If Ernest Hemingway wrote five books, his name, email, and bio are stored five times. Want to update his bio? You need to update five rows. Miss one? Now some books show the old bio and others show the new one. Your data is inconsistent, and inconsistent data is the kind of bug you only notice when a user points it out.

No real relationships. Want all books by Hemingway? You search by author_name = 'Ernest Hemingway'. But what if one row says “Ernest Hemingway” and another says “Hemingway, Ernest”? There is no relationship enforcing consistency, just strings that might or might not match.

Comma-separated tags. Want all books tagged “fiction”? You need WHERE tags LIKE '%fiction%'. That also matches “non-fiction.” You cannot index it. You cannot count how many books have each tag. You cannot enforce that a tag is valid. It looks convenient until you try to do anything real with it.

A better schema

Here is the same idea, but redesigned so each fact lives in exactly one place:

CREATE TABLE authors (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  name TEXT NOT NULL,
  email TEXT UNIQUE,
  bio TEXT
);

CREATE TABLE books (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  title TEXT NOT NULL,
  author_id INTEGER NOT NULL,
  genre TEXT NOT NULL,
  published_at TEXT,
  FOREIGN KEY (author_id) REFERENCES authors(id)
);

CREATE TABLE tags (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  name TEXT NOT NULL UNIQUE
);

CREATE TABLE book_tags (
  book_id INTEGER NOT NULL,
  tag_id INTEGER NOT NULL,
  PRIMARY KEY (book_id, tag_id),
  FOREIGN KEY (book_id) REFERENCES books(id),
  FOREIGN KEY (tag_id) REFERENCES tags(id)
);

More tables, but look at what you get in return.

No redundancy. Hemingway’s bio is stored once, in the authors table. Update it in one place and every book reflects the change automatically.

Clear relationships. author_id links a book to its author. Want all books by author 1? WHERE author_id = 1. No string matching. No ambiguity. The database can enforce that the author actually exists.

Proper tags. Each tag is a row in the tags table. The book_tags table connects books to tags. Want all fiction books? Join book_tags with tags WHERE name = 'fiction'. Exact match. Indexable. Countable.

You might notice that we are not wrapping this “good schema” in a migration file yet. That is on purpose. Right now we are comparing shapes, not shipping changes. Over the next few lessons we will actually build these tables, one at a time, each through its own migration file. That is how real production schemas grow: one deliberate migration at a time, reviewed and applied in order.

Design first

Most developers create tables as they code features. A signup form needs a users table, so they add one. A bookmark feature needs a bookmarks table, so they add one. Each table is designed in isolation, and the overall structure grows organically. This works for a while, but eventually the schema becomes tangled and hard to change.

The better approach is to design the entire schema before writing much application code. Map out the entities (users, books, authors, tags), their relationships (an author has many books, a book has many tags), and their constraints (email must be unique, title cannot be null). Then you translate that plan into migrations and apply them in order. This course teaches that process.

Exercises

Exercise 1: Think about an app you use daily (Twitter, Spotify, Netflix). What tables would you need? What columns? Do not write SQL yet. Just list the entities and relationships.

Exercise 2: Look at the bad schema above. What happens if you need to add a second author to a book? How would you do it with comma-separated author names? What problems does that create?

What is the main problem with storing author data directly in the books table?

In the next lesson, we will start actually building this schema. We will add authors and books tables alongside our existing users table and introduce them through a new migration file. That is the most common relationship pattern in any database: one-to-many.

Access Required

Why schema design matters

A bad schema

A better schema

Design first

Exercises