hectoday
DocsCoursesChangelog GitHub
DocsCoursesChangelog GitHub

Access Required

Enter your access code to view courses.

Invalid code

← All courses Do You Need a Database?

Storage Fundamentals

  • A Database is Just Files
  • The Setup
  • Approach 1: Linear Scan
  • Approach 2: In-Memory Map
  • Approach 3: Binary Search on Disk
  • SQLite as a Baseline
  • Benchmarking with wrk
  • Reading the Numbers
  • When You Actually Need a Database
  • Quiz: Storage Fundamentals (wip)

Writes and Durability

  • The Write Path
  • Append Throughput
  • Writes Break the Index
  • Concurrent Writers
  • Atomic Multi-Record Writes

The setup

In the intro we promised to build the same small API four different ways. Before we can do that, we need the “same” part. Same routes. Same data. Same server scaffold. The only thing that should change between lessons is the code that actually reads and writes the data.

This lesson is all plumbing. We set up the project, decide on a folder layout, write the HTTP server one small piece at a time, and build a script that generates fake users so we have something to benchmark against later. After this, the next four lessons can focus entirely on the interesting part: the storage code itself.

The data

We are going to pretend we are building a small app with three kinds of records: users, products, and orders. Each one lives in its own file on disk. The file format is called JSONL, which stands for newline-delimited JSON. It looks like this:

{"id":"a3f1...","name":"Alice Chen","email":"[email protected]","created_at":"2026-04-15T..."}
{"id":"b7d2...","name":"Bob Torres","email":"[email protected]","created_at":"2026-04-15T..."}

Every line is one complete JSON document, one record per line. If you have seen a regular JSON file that looks like [{...}, {...}, {...}], JSONL is almost the same thing with three differences that matter a lot.

First, there is no outer array. The file does not start with [ and end with ]. That means you can append to the file by writing one more line at the end. With a regular JSON array, appending a record requires you to read the whole file, parse it, push a new item onto the array, re-serialize it, and write the whole thing back. JSONL skips all of that.

Second, you can read it line by line without loading the whole thing. If you have a million records and you only need one, you can stream the file and stop the moment you find it. This is going to matter for our first storage approach.

Third, each line is independently valid JSON. If one line somewhere in the middle gets corrupted, the rest of the file is still readable. With a single giant JSON blob, one bad byte can break the whole thing.

For the rest of the course we are going to focus on users.jsonl, because the read pattern is the same for all three entities. Whatever works for users works for products and orders.

The routes

Two endpoints:

  • POST /users to create a user
  • GET /users/:id to fetch a user by id

We will benchmark the GET path first. That is where the storage strategies produce the most visibly different numbers. Writes have their own story involving durability, batching, and contention, and section 2 of the course picks that up in detail.

Step 1: Create the project

Make a new directory, cd into it, and initialize a Node project.

mkdir do-you-need-a-database
cd do-you-need-a-database
npm init -y
npm pkg set type=module
npm install @hectoday/http srvx 'zod@^3.25'
npm install -D @types/node

Three runtime dependencies and one dev dependency. @hectoday/http is the web framework. srvx is a tiny universal server adapter that lets a Hectoday fetch handler run on Node, Bun, or Deno with no code changes. zod is a schema validation library that Hectoday uses internally to validate incoming requests. We will see all three in action in a moment.

The 'zod@^3.25' pin is deliberate. Zod 4 is out, but Hectoday’s peer dependency asks for zod@^3.25.0, so installing the latest zod causes an ERESOLVE conflict. The 3.25 line has the modern Zod v4 API surface available under the zod/v4 subpath, which is what Hectoday itself uses internally. Once Hectoday updates its peer to allow zod 4, you can drop the pin.

@types/node is the TypeScript type definitions for Node’s built-in modules. Node itself does not need them (it just strips types at load time), but your editor’s TypeScript server does. Without this package, you will see red squiggles under every import ... from "node:fs" line. Install it once and forget about it.

"type": "module" in package.json lets us use ES module syntax (import/export) directly. And Node 22 and later run .ts files natively by stripping types at load time, so we do not need a bundler or a TypeScript loader. node server.ts just works.

Step 2: Decide on a folder layout

Before we write any code, pin down where things go. The convention we will use for the rest of the course:

do-you-need-a-database/
├── package.json
├── src/
│   ├── server.ts   ← HTTP server (stays the same every lesson)
│   └── store.ts    ← the storage code (changes every lesson)
└── seed.ts         ← one-off script to generate fake users

Why put the server in src/? Two reasons. One, it keeps application code separate from utility scripts like seed.ts that we only run occasionally. Two, when we start running multiple one-off scripts later (a build-index.ts for binary search, an import-jsonl.ts for SQLite), they live at the root alongside seed.ts and the src/ folder stays quiet and focused.

Create the directory now.

mkdir src

Step 3: Add run scripts to package.json

Open package.json and add a scripts block. We will use these a lot.

{
  "scripts": {
    "dev": "node --watch src/server.ts",
    "start": "node src/server.ts"
  }
}

dev runs the server with Node’s built-in file watcher, so the process restarts any time a file it imported changes. That is what you will use while iterating on store.ts. Run it with npm run dev.

start runs the server once, no watcher. That is what you would run under a process manager in production. Run it with npm start.

One-off utilities like seed.ts we will invoke directly with node seed.ts instead of wiring up more scripts. Passing command-line arguments through npm run requires a -- separator that is easy to forget, so node script.ts arg1 arg2 is the habit to build.

Step 4: Start the server, no routes yet

Open src/server.ts and write the smallest useful version: an app with no routes, handed to srvx’s serve.

// src/server.ts
import { serve } from "srvx";
import { setup } from "@hectoday/http";

const app = setup({ routes: [] });

serve({ port: 8081, fetch: app.fetch });

Three lines of real code. setup({ routes: [] }) gives us back an app object with a fetch function. fetch is a plain function that takes a web-standard Request and returns a Response. serve from srvx is what actually binds a port, listens for TCP connections, and calls our fetch function for every request that comes in. Behind the scenes it wraps Node’s built-in http module, but we never see that.

Run it.

npm run dev

You should see a line that says the server is listening on port 8081. Hit it with curl in another terminal:

curl -i http://localhost:8081/anything

You get a 404 because we have not registered any routes yet. That is fine. The server is alive.

Step 5: Stub out src/store.ts

Before we add routes that import from ./store.ts, we need ./store.ts to exist. We do not have a real storage implementation yet (that is the next lesson), but we can write a stub that satisfies the contract. Create src/store.ts:

// src/store.ts
// Placeholder store for the setup lesson. The next lesson replaces this
// with a real linear-scan implementation.

export interface User {
  id: string;
  name: string;
  email: string;
  created_at: string;
}

export function findUser(_id: string): User | null {
  return null;
}

export function createUser(_name: string, _email: string): User {
  throw new Error("store not implemented yet, see the linear-scan lesson");
}

The User interface describes the shape of a user record. findUser returns null for every id, as if the database were empty. createUser throws, because pretending to create a user when we have nowhere to put it would be misleading.

With this file in place, the imports we add to src/server.ts in a moment will resolve, and the server will actually start.

Step 6: Add the GET route

Now we add GET /users/:id. Update src/server.ts:

// src/server.ts
import { serve } from "srvx";
import { route, setup } from "@hectoday/http";
import { z } from "zod/v4";
import { findUser } from "./store.ts";

const getUser = route.get("/users/:id", {
  request: {
    params: z.object({ id: z.string() }),
  },
  resolve: ({ input }) => {
    if (!input.ok) {
      return Response.json(input.issues, { status: 400 });
    }

    const user = findUser(input.params.id);
    if (!user) {
      return Response.json({ error: "not found" }, { status: 404 });
    }

    return Response.json(user);
  },
});

const app = setup({ routes: [getUser] });

serve({ port: 8081, fetch: app.fetch });

Let us walk through what is new, one piece at a time.

We import route from Hectoday. This is how we declare an endpoint. We also import z from Zod (we use zod/v4, the modern Zod API surface) to describe request shapes. And we import findUser from the stub ./store.ts we just created.

const getUser = route.get("/users/:id", { ... });

This says “declare a GET endpoint at path /users/:id.” The :id part is a URL parameter. When a request comes in for /users/abc-123, Hectoday pulls abc-123 out and makes it available as input.params.id.

request: {
  params: z.object({ id: z.string() }),
},

This is the schema. It tells Hectoday what shape we expect the request in. Here we only care about params, and we expect a single string called id. If a request arrives that does not match, Hectoday will still run our handler, but input.ok will be false and input.issues will contain the validation errors.

resolve: ({ input }) => {
  if (!input.ok) {
    return Response.json(input.issues, { status: 400 });
  }
  ...
}

resolve is the function that actually runs for each request. Hectoday calls it with a context object that has an input property (we destructure it directly). input is a discriminated union: on the happy path, input.ok is true and you can safely read input.params.id. On the unhappy path, input.ok is false and you get a list of issues to return.

If you have used TypeScript’s discriminated unions before, this is exactly that pattern. The if (!input.ok) check narrows the type for the compiler, so your editor will let you reach into params only inside the success branch.

Hectoday always gives you input.ok, even on routes without much validation. Treat the check as a safe default habit, the same way you would always handle an error case before reading a value.

const user = findUser(input.params.id);
if (!user) {
  return Response.json({ error: "not found" }, { status: 404 });
}
return Response.json(user);

The happy path is short. We look up the user by id. If we cannot find them, return a 404. Otherwise wrap the user in Response.json(...) and return it. Response here is the standard Web Response object. There is no res.json() helper or ctx.send() magic. Hectoday is built on Web Standards, so every handler returns a real Response, the same kind you would get from fetch(url) in a browser.

Finally we pass [getUser] into setup({ routes: [...] }). That is the list of endpoints the app knows about. Add more routes to this list as you add them.

Step 7: Add the POST route

Same idea, but now we also validate a JSON body. Update src/server.ts again:

// src/server.ts
import { serve } from "srvx";
import { route, setup } from "@hectoday/http";
import { z } from "zod/v4";
import { findUser, createUser } from "./store.ts";

const getUser = route.get("/users/:id", {
  request: {
    params: z.object({ id: z.string() }),
  },
  resolve: ({ input }) => {
    if (!input.ok) {
      return Response.json(input.issues, { status: 400 });
    }

    const user = findUser(input.params.id);
    if (!user) {
      return Response.json({ error: "not found" }, { status: 404 });
    }

    return Response.json(user);
  },
});

const postUser = route.post("/users", {
  request: {
    body: z.object({
      name: z.string().min(1),
      email: z.email(),
    }),
  },
  resolve: ({ input }) => {
    if (!input.ok) {
      return Response.json(input.issues, { status: 400 });
    }

    const user = createUser(input.body.name, input.body.email);
    return Response.json(user, { status: 201 });
  },
});

const app = setup({ routes: [getUser, postUser] });

serve({ port: 8081, fetch: app.fetch });

Two things to notice about postUser.

First, the schema lives under body instead of params. Hectoday will automatically parse the request as JSON, run Zod against the body, and hand us the typed result as input.body on the success branch. z.string().min(1) says “a non-empty string.” z.email() is Zod’s built-in email check.

Second, we return { status: 201 }, which is the HTTP “Created” status. The convention for create endpoints is to use 201 rather than 200 to tell the client “a new resource was made.” Hectoday does not enforce this, but it is a good habit.

And we update the route list at the bottom to include both routes: [getUser, postUser].

That is the whole server. From here on, every lesson in section 1 uses this exact file unchanged. Everything interesting happens in src/store.ts.

Step 8: Try it

Run the server.

npm run dev

In another terminal, try the routes. None of them will do anything useful yet because our store is all stubs, but we can confirm the wiring.

# Validation failure (empty body): 400
curl -i -X POST http://localhost:8081/users \
  -H 'Content-Type: application/json' \
  -d '{}'

You should get a 400 response with a JSON body describing the validation issues. The handler never called createUser, because input.ok was false. Good, validation works.

# Unknown user: 404
curl -i http://localhost:8081/users/abc

This one returns a 404. The request made it through routing and validation, Hectoday called findUser("abc") on our stub store, which returned null, and our handler turned that into a 404.

# Valid body: 500 (stub createUser throws on purpose)
curl -i -X POST http://localhost:8081/users \
  -H 'Content-Type: application/json' \
  -d '{"name":"Alice","email":"[email protected]"}'

This one fails with a 500, because our stub createUser throws. That is our cue to write a real store in the next lesson.

Stop the server with Ctrl+C before moving on.

Why does the resolve handler check input.ok before accessing input.params?

Step 9: A seed script for later

We need data to benchmark against. Writing ten thousand users by hand is not going to happen, so here is a small script that generates them for us. Put it at the root, not in src/. It is a one-off utility, not application code.

// seed.ts
import { randomUUID } from "node:crypto";
import { writeFileSync } from "node:fs";

const count = Number(process.argv[2] ?? 10_000);
const out = process.argv[3] ?? "users.jsonl";

const lines: string[] = [];
const ids: string[] = [];

for (let i = 0; i < count; i++) {
  const id = randomUUID();
  ids.push(id);
  lines.push(
    JSON.stringify({
      id,
      name: `User ${i}`,
      email: `user${i}@example.com`,
      created_at: new Date().toISOString(),
    }),
  );
}

writeFileSync(out, lines.join("\n") + "\n");
writeFileSync("ids.json", JSON.stringify(ids));
console.log(`wrote ${count} users to ${out}`);

Quick walk-through. At the top we pull in randomUUID from node:crypto (it produces strings like a3f1b7d2-c8e4-4f3a-9d1b-...) and writeFileSync from node:fs so we can save files. The two process.argv reads look at command-line arguments: the first optional arg is how many users to generate, and the second is the output filename. If you do not pass anything, you get 10,000 users written to users.jsonl.

The loop does two things on each iteration. It pushes one line of JSON into lines, and it pushes the generated id into a separate ids array. We keep the ids around because when we run our benchmark later, the load generator needs a list of real ids to request. Hitting the same url over and over would let the OS cache the answer and give us misleadingly fast numbers, so we pick a random id from this list for every request.

At the end, we write the JSONL to users.jsonl and the id list to ids.json. The console.log is so we know it ran.

Run it:

node seed.ts 10000 users.jsonl

We will regenerate at 100,000 and 1,000,000 records when we start benchmarking. For now, 10,000 is enough to get a server running.

We have the data, we have the routes, we have the framework. In the next lesson we write the simplest possible storage strategy. Open the file on every request, scan it line by line, return the matching user. It will work. It will also fall apart spectacularly as the file grows, and that is the point.

← A Database is Just Files Approach 1: Linear Scan →

© 2026 hectoday. All rights reserved.