SQLite as a baseline

We have built three storage strategies from scratch. Now we need something to compare them against. If we do not measure a real database in the same setup, all our numbers are floating in a vacuum. “We did 72,000 requests per second” is meaningless unless we know what a real database does on the same machine.

So in this lesson we wire up SQLite into the same Hectoday server, with the same routes, and run the same benchmark. If a database is ever going to look slow compared to flat files, this is where we should see it, because SQLite is the lightest-weight database in common use.

Why SQLite is the honest baseline

A lot of people reach for Postgres or MySQL as the reference for “a real database.” Those are fine databases. They also come with a lot of overhead that has nothing to do with the database itself: a separate process, a network socket, a connection pool, authentication, query protocol parsing. If we benchmarked Postgres here, we would partly be measuring the cost of IPC, not the cost of indexed lookup.

SQLite is different. It runs inside your process. No separate server. No network. Your code calls a function, SQLite reads its own file, your code gets a result. The only layers between you and your data are the B-tree lookup and a syscall or two. It is almost exactly as apples-to-apples as it gets against our own flat-file code.

Node ships with node:sqlite built in as of Node 22.5. You import it, hand it a filename, and you have a database. No npm install, no driver to pick, nothing to configure.

The implementation

// src/store.ts
import { DatabaseSync } from "node:sqlite";
import { randomUUID } from "node:crypto";

const db = new DatabaseSync("users.db");
db.exec("PRAGMA journal_mode = WAL");
db.exec(`
  CREATE TABLE IF NOT EXISTS users (
    id         TEXT PRIMARY KEY,
    name       TEXT NOT NULL,
    email      TEXT NOT NULL,
    created_at TEXT NOT NULL
  )
`);

export interface User {
  id: string;
  name: string;
  email: string;
  created_at: string;
}

const findStmt = db.prepare("SELECT id, name, email, created_at FROM users WHERE id = ?");

const insertStmt = db.prepare(
  "INSERT INTO users (id, name, email, created_at) VALUES (?, ?, ?, ?)",
);

export function findUser(id: string): User | null {
  const row = findStmt.get(id) as User | undefined;
  return row ?? null;
}

export function createUser(name: string, email: string): User {
  const user: User = {
    id: randomUUID(),
    name,
    email,
    created_at: new Date().toISOString(),
  };
  insertStmt.run(user.id, user.name, user.email, user.created_at);
  return user;
}

Let us walk through the interesting parts. Most of this should feel familiar by now; the new things are the SQLite-specific calls.

new DatabaseSync("users.db") opens (or creates) a SQLite database file at users.db. The Sync in the name means operations block and return results directly, rather than returning promises. SQLite is fast enough that sync is usually fine for a web server, and it keeps our handler code simple.

db.exec("PRAGMA journal_mode = WAL") is the important setup line. WAL stands for “write-ahead log.” Without this line, every write locks the whole database against reads, so under load your readers would block whenever a writer is in the middle of a transaction. WAL mode separates writes into a side-log file, which lets readers continue against the main database while writes happen. For a read-heavy server, this is nearly free throughput.

The second db.exec creates the users table if it does not exist. TEXT PRIMARY KEY on id does two things at once. It enforces uniqueness, and it tells SQLite to build a B-tree index on id. That index is what makes WHERE id = ? fast. Without it, SQLite would scan the whole table on every lookup, which is exactly what our linear scan approach already does, badly.

Then we create two prepared statements.

A prepared statement is a parsed, analyzed, plan-ready SQL query that you can run repeatedly with different parameters. When you call db.prepare(sql), SQLite parses the SQL, figures out which indexes to use, and hands you back a reusable object. Calling findStmt.get(id) later just binds the parameter and runs the plan. No re-parsing.

This matters a lot for throughput. Preparing on every request would add real overhead. The idiomatic pattern is to prepare your statements once at module load, hold onto the objects, and call .get() or .run() on them from your handlers.

findUser calls findStmt.get(id), which returns either the matching row or undefined. We cast it to User for TypeScript’s benefit, since SQLite’s row objects are unknown by default, and we coalesce undefined to null so the return type matches what the handler expects.

createUser builds the record the same way it did in the earlier approaches, then calls insertStmt.run(...) with the four positional parameters matching the four ? placeholders. .run() is for statements that do not return rows; it returns some metadata we are ignoring here.

The ? placeholders are how we pass data safely. Never interpolate user input into a SQL string. The placeholders let SQLite handle escaping, and they make SQL injection essentially impossible for this kind of query.

The handler does not change

const getUser = route.get("/users/:id", {
  request: {
    params: z.object({ id: z.string() }),
  },
  resolve: ({ input }) => {
    if (!input.ok) {
      return Response.json(input.issues, { status: 400 });
    }

    const user = findUser(input.params.id);
    if (!user) {
      return Response.json({ error: "not found" }, { status: 404 });
    }

    return Response.json(user);
  },
});

Identical to the in-memory map version. findUser is synchronous, the return type is still User | null, and the rest of the handler has no idea what is underneath. This is the whole point of the discipline we set up in the first lesson. Three completely different storage backends, one unchanged handler.

Seeding from JSONL

Our seed script writes JSONL, but SQLite wants its own file format. We need a one-time import script that reads users.jsonl and bulk-loads it into users.db.

// import-jsonl.ts
import { DatabaseSync } from "node:sqlite";
import { readFileSync } from "node:fs";

const db = new DatabaseSync("users.db");
db.exec("PRAGMA journal_mode = WAL");
db.exec(`
  CREATE TABLE IF NOT EXISTS users (
    id         TEXT PRIMARY KEY,
    name       TEXT NOT NULL,
    email      TEXT NOT NULL,
    created_at TEXT NOT NULL
  )
`);

const insert = db.prepare(
  "INSERT OR IGNORE INTO users (id, name, email, created_at) VALUES (?, ?, ?, ?)",
);

interface Row {
  id: string;
  name: string;
  email: string;
  created_at: string;
}

const lines = readFileSync(process.argv[2] ?? "users.jsonl", "utf8").split("\n");
const rows: Row[] = lines.filter((l) => l.trim()).map((l) => JSON.parse(l) as Row);

db.exec("BEGIN");
for (const r of rows) insert.run(r.id, r.name, r.email, r.created_at);
db.exec("COMMIT");

console.log(`imported ${rows.length} users`);

Two things worth calling out.

INSERT OR IGNORE makes the import idempotent. If we run it twice on the same file, the second run does not error on duplicate primary keys; it just skips them. That saves an embarrassing class of bug during development.

The BEGIN / COMMIT pair around the loop is the performance unlock. Without it, SQLite would fsync to disk after every single insert, and a million-row import would take minutes. With a single transaction wrapping everything, we get one fsync at the end and the whole import finishes in seconds. That is a roughly one-thousand-times speedup for not doing much.

node:sqlite does not ship a db.transaction(fn) helper the way some other SQLite clients do, so we issue the BEGIN and COMMIT manually. Section 2 goes deep into why this matters for throughput and what happens when you skip it.

Run it:

node seed.ts 1000000 users.jsonl
node import-jsonl.ts users.jsonl

Now users.db contains the same million users as users.jsonl, and we are ready to benchmark.

The numbers

Measured on Node 24, Apple Silicon Mac mini.

Records	Requests/sec	Avg latency
10k	52,258	0.94ms
100k	47,436	1.01ms
1M	50,307	1.02ms

Like the in-memory map and binary search, SQLite scales essentially flat. Throughput barely changes as we grow from ten thousand to a million records. That is the B-tree paying off. A B-tree is a balanced tree structure where every lookup takes O(log n) steps. With a high branching factor, even a million-row B-tree is only a handful of levels deep. The difference in work between 10k and 1M is small enough to get lost in noise.

Now the comparison, all four approaches at 1M records.

Approach	Req/s @ 1M	Avg latency
Linear scan	5	1.06s
Binary search on disk	26,448	1.93ms
SQLite	50,307	1.02ms
In-memory map	72,074	711µs

Two things in this table are worth sitting with.

SQLite is faster than our hand-rolled binary search. About twice as fast, at every scale. That is not a typo. Our binary search opens two file descriptors, seeks, reads, and closes them on every request. Those syscalls are cheap individually, but they add up when you are doing tens of thousands of requests a second. SQLite, by contrast, keeps its database handle open, caches prepared statement plans, and goes through its own tight page buffer pool. We could close the gap by holding file descriptors open and optimizing the lookup path, but at that point we would be rewriting a tiny slice of SQLite. Someone already did that work, and did it better.

The in-memory map is only about 1.4x faster than SQLite. That is closer than it feels. SQLite is doing a real B-tree walk, reading disk pages (cached or not), and decoding them on every request. The map is a pointer dereference. A 1.4x gap for that much added functionality is remarkably small.

The in-memory map is only about 1.4x faster than SQLite on Node, even though one is a pointer dereference and the other goes through a full B-tree lookup. What does that suggest?

What you actually buy with SQLite

If you only ever look at raw req/s, SQLite looks like a bad deal compared to the Map. But that is only true if your entire data access story is SELECT WHERE id = ?.

The moment you want any of these:

A second query: SELECT * FROM users WHERE email = ?
A range scan: SELECT * FROM users WHERE created_at > ? ORDER BY created_at DESC LIMIT 50
A join: SELECT u.name, COUNT(o.id) FROM users u LEFT JOIN orders o ON o.user_id = u.id GROUP BY u.id
A transaction across two tables: INSERT INTO orders ...; UPDATE inventory SET quantity = quantity - 1 WHERE ...
An aggregate: SELECT COUNT(*) FROM users WHERE created_at > ?

…you are asking for more than a single-key lookup. With flat files, you would have to build each of these yourself. A second hash map keyed on email. A sorted array of creation dates. Hand-rolled coordination for two-file writes. Every one of those is a small project that gets slightly wrong every time it is rewritten.

SQLite gives you all of them for free. In exchange, you are paying under a 1.5x throughput cost on the trivial case. For almost any real application, that is nowhere near the bottleneck. The hours you would have spent maintaining hand-rolled indexes almost always are.

When SQLite is the right answer

Most of the time, genuinely.

If you need any of those SQL features, and your dataset fits on one machine, SQLite is the right answer. It runs in-process, it persists to a single file, it has full transactional semantics, and it serves around 50,000 requests per second on a laptop. In an upcoming lesson we will translate that number into product terms and figure out how many users it actually supports.

Where SQLite stops being the right answer is when you need multiple machines writing concurrently, or when your dataset exceeds what one machine can hold. That is when you reach for Postgres, MySQL, or something distributed. But you will be surprised how far a single SQLite file can take you.

In the next lesson we set up wrk and run all four servers under load so you can reproduce these numbers yourself.

Access Required