Primary keys

In the last lesson, we locked down our data with constraints. NOT NULL, UNIQUE, CHECK, DEFAULT. They all work together to keep bad values out. But every single one of them assumes something we have not really examined: that each row has a unique identity we can point to. That is what a primary key is. No two rows can have the same primary key value. It is how you say “get user 42” or “delete book abc-123.”

Picking the right kind of primary key matters more than you might expect. Different choices have different tradeoffs, and they can affect security, performance, and how easily you can move data between systems. Let’s go through the options.

Auto-increment integers

CREATE TABLE users (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  name TEXT NOT NULL
);

INSERT INTO users (name) VALUES ('Alice');  -- id = 1
INSERT INTO users (name) VALUES ('Bob');    -- id = 2
INSERT INTO users (name) VALUES ('Carol');  -- id = 3

AUTOINCREMENT generates the next integer automatically. You never set the id yourself.

What is good about this: The IDs are small (4 bytes), fast to compare, easy to read while debugging, and sorted by creation order. You can tell at a glance that user 3 was created after user 1.

What is not so good: The IDs are sequential and predictable. User 1, user 2, user 3. An attacker can guess valid IDs. If your API exposes these in URLs (/users/1, /users/2, /users/3), someone can iterate through them to access other users’ data. Sequential IDs also reveal how many records you have. User 1000 means you probably have around 1000 users.

UUIDs

CREATE TABLE users (
  id TEXT PRIMARY KEY,
  name TEXT NOT NULL
);

// In your application code
const id = crypto.randomUUID(); // "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
db.prepare("INSERT INTO users (id, name) VALUES (?, ?)").run(id, "Alice");

A UUID (Universally Unique Identifier) is a 128-bit random string. The chances of two UUIDs colliding are astronomically small.

What is good about this: The IDs are unguessable, so they are safe to expose in URLs. They can be generated on the client without needing a database round-trip. And they are unique across databases, which matters if you ever need to merge data from multiple sources.

What is not so good: They are larger (36 bytes as text), harder to read while debugging, randomly ordered (not sorted by creation time), and slightly slower to index than integers.

Which to use

Use integers for internal tables that are never exposed in URLs (session records, log entries), for development and prototyping, and for tables where creation order is important.

Use UUIDs for resources exposed in URLs (/users/abc-123), for tables referenced by external systems, and for anything where guessable IDs would be a security risk.

[!NOTE] This is why the REST API Design and Web Security courses use UUIDs for API resources. Guessable IDs enable IDOR (Insecure Direct Object Reference) attacks, where an attacker changes the ID in a URL to access someone else’s data.

Natural keys vs surrogate keys

A natural key is a real-world identifier: an email address, an ISBN, a country code.

A surrogate key is an artificial identifier: an auto-increment integer or a UUID.

-- Natural key: email IS the primary key
CREATE TABLE users (
  email TEXT PRIMARY KEY,
  name TEXT NOT NULL
);

-- Surrogate key: id is artificial, email is just a column
CREATE TABLE users (
  id TEXT PRIMARY KEY,
  email TEXT NOT NULL UNIQUE,
  name TEXT NOT NULL
);

Use surrogate keys in almost all cases. Natural keys change (a user changes their email), have format issues (emails are case-insensitive), and create problems in foreign keys (every table that references users would store the full email string instead of a compact ID).

Natural keys are acceptable for: junction tables (composite keys from two foreign keys), lookup tables with stable codes (country_code TEXT PRIMARY KEY), and tables where the natural identifier truly never changes.

Composite primary keys

Some tables use two or more columns as the primary key together:

CREATE TABLE book_tags (
  book_id TEXT NOT NULL,
  tag_id TEXT NOT NULL,
  PRIMARY KEY (book_id, tag_id),
  FOREIGN KEY (book_id) REFERENCES books(id),
  FOREIGN KEY (tag_id) REFERENCES tags(id)
);

The combination of book_id and tag_id is unique. A book can have many tags, and a tag can belong to many books, but the same book-tag pair cannot exist twice. This is the junction table pattern, which we will cover in the Many-to-Many lesson.

Exercises

Exercise 1: Create a table with INTEGER PRIMARY KEY AUTOINCREMENT. Insert 3 rows without specifying the ID. Verify the IDs are 1, 2, 3.

Exercise 2: Create a table with TEXT PRIMARY KEY. Insert rows with UUIDs generated in code. Try inserting two rows with the same UUID. What happens?

Exercise 3: Try using email as a primary key. Insert a user. Now the user changes their email. What do you need to update? Think about what happens to every foreign key that references this user.

Why do the courses in this series use UUIDs instead of auto-increment integers for API-exposed resources?

You now know how to create tables, pick column types, enforce rules with constraints, and choose a primary key that fits the job. That is a complete schema. But there is a small problem with how we have been building it. We have been typing CREATE TABLE directly into the shell. That works for learning, but it does not survive once another person or another machine enters the picture. Your teammate cannot replay your commands. Your production server does not know what you typed last Tuesday. In the next section, we will turn these schema changes into versioned migration files that anyone can replay.