hectoday
DocsCoursesChangelog GitHub
DocsCoursesChangelog GitHub

Access Required

Enter your access code to view courses.

Invalid code

← All courses Do You Need a Database?

Storage Fundamentals

  • A Database is Just Files
  • The Setup
  • Approach 1: Linear Scan
  • Approach 2: In-Memory Map
  • Approach 3: Binary Search on Disk
  • SQLite as a Baseline
  • Benchmarking with wrk
  • Reading the Numbers
  • When You Actually Need a Database
  • Quiz: Storage Fundamentals (wip)

Writes and Durability

  • The Write Path
  • Append Throughput
  • Writes Break the Index
  • Concurrent Writers
  • Atomic Multi-Record Writes

Approach 1: linear scan

We have a server scaffold and a stub src/store.ts. Now it is time to write the real thing, and for our very first attempt we are going to do the most embarrassingly simple thing we can think of. No indexes, no in-memory cache, no cleverness at all. Just open the file and look for the user.

You might already be shaking your head at this. Good. The point of starting here is to get a feel for when this actually falls apart, not when you think it will. That turns out to be much later than most people expect.

The idea in one paragraph

A request comes in for user abc-123. We open users.jsonl, read it one line at a time, parse each line as JSON, and check if the id matches. If it does, we return the user and stop reading. If we reach the end of the file without a match, we return null. That is it.

No in-memory map. No loaded index. Nothing to keep in sync with the file on disk. The file is the state.

Step 1: Set up the imports

Open src/store.ts (it currently contains the stub from the last lesson) and replace it with the imports we need. We will fill in the functions in the next steps.

// src/store.ts
import { createReadStream, appendFileSync, existsSync } from "node:fs";
import { createInterface } from "node:readline";
import { randomUUID } from "node:crypto";

const USERS_FILE = "users.jsonl";

export interface User {
  id: string;
  name: string;
  email: string;
  created_at: string;
}

Quick walkthrough of what we are pulling in.

From node:fs (Node’s built-in filesystem module) we grab three helpers. createReadStream opens a file as a stream we can read chunks from, without loading the whole file into memory. appendFileSync adds bytes to the end of a file without rewriting the existing contents. existsSync checks whether a file is there yet, which we need for the very first run before any users have been created.

From node:readline we grab createInterface. It takes a stream and gives us something we can iterate line by line. Without it, we would have to buffer chunks ourselves and split on newlines, which is a small amount of plumbing that is easy to get slightly wrong.

From node:crypto we grab randomUUID, which generates a fresh unique id string like a3f1b7d2-c8e4-4f3a-9d1b-... for every new user we create.

USERS_FILE is just a constant for the filename, so we do not have a string literal floating around in three places. User is a TypeScript interface describing the shape of a record. It is compile-time only. It does not validate anything at runtime; it just tells the compiler what to expect.

Step 2: Write findUser

Now the lookup function. Add this below the User interface.

export async function findUser(id: string): Promise<User | null> {
  if (!existsSync(USERS_FILE)) return null;

  const stream = createReadStream(USERS_FILE, { encoding: "utf8" });
  const rl = createInterface({ input: stream, crlfDelay: Infinity });

  for await (const line of rl) {
    if (!line.trim()) continue;
    const user = JSON.parse(line) as User;
    if (user.id === id) {
      stream.destroy();
      return user;
    }
  }

  return null;
}

Let us go through this carefully.

The function signature

export async function findUser(id: string): Promise<User | null>;

async is there because reading a stream is asynchronous. The first chunk of bytes might not be available the moment we ask for it, so the function has to wait. We take an id as input, and we return either a User or null, wrapped in a Promise because of the async.

The bail-out

if (!existsSync(USERS_FILE)) return null;

The very first time the server starts, before any users have been created, users.jsonl does not exist yet. Without this check, createReadStream would throw and our whole request would 500. So we exit early and say “no user found.”

Opening the stream

const stream = createReadStream(USERS_FILE, { encoding: "utf8" });
const rl = createInterface({ input: stream, crlfDelay: Infinity });

createReadStream gives us a raw byte stream. The { encoding: "utf8" } option tells Node to hand us strings instead of raw Buffer objects. Without it, we would get Buffer objects and have to call .toString() ourselves.

createInterface wraps the raw stream and gives us something we can iterate line by line. crlfDelay: Infinity is a detail that handles Windows-style line endings (\r\n) correctly. We are unlikely to hit Windows line endings on our own file, but it costs nothing to add.

The main loop

for await (const line of rl) {
  if (!line.trim()) continue;
  const user = JSON.parse(line) as User;
  if (user.id === id) {
    stream.destroy();
    return user;
  }
}

for await is the async version of a normal for loop. On every iteration it waits for the next line from the stream and assigns it to line. If the line is empty or just whitespace, we skip it with continue. Otherwise we parse it as JSON, cast it to our User type, and check if the id matches.

When we find a match, we do something a little subtle. We call stream.destroy() before returning. Why? Because otherwise the stream would keep reading the rest of the file even though we are done with it. destroy() tells Node to stop and clean up. On a 1,000 record file this is invisible. On a million records it is the difference between returning in 10ms and chewing through 500,000 more lines for nothing.

The fall-through

return null;

If the loop finishes without finding a match, we fall through and return null. The handler will turn that into a 404.

Step 3: Write createUser

Writes are easier than reads here. There is no in-memory state to update. Add this at the bottom of src/store.ts.

export function createUser(name: string, email: string): User {
  const user: User = {
    id: randomUUID(),
    name,
    email,
    created_at: new Date().toISOString(),
  };
  appendFileSync(USERS_FILE, JSON.stringify(user) + "\n");
  return user;
}

Three things happen.

First, we build the user record in memory. randomUUID() gives us a fresh id. new Date().toISOString() gives us a string like 2026-04-21T09:17:32.123Z for the created-at timestamp.

Second, we serialize the record to JSON and append it as one line to users.jsonl. appendFileSync does exactly what it sounds like: open the file in append mode, write the bytes at the end, close it. No overwrite, no risk to existing lines.

Third, we return the record. The handler will wrap it in a Response.json(...) and send it back to the client.

Notice we do not update any in-memory state. There is none. The file is the only source of truth, and every read starts from scratch by opening and scanning it again.

Step 4: Update the handler to await

findUser used to be synchronous in the stub store. We just made it async. That means the version of getUser we wrote in the setup lesson does not compile any more. Go update src/server.ts to await the result.

const getUser = route.get("/users/:id", {
  request: {
    params: z.object({ id: z.string() }),
  },
  resolve: async ({ input }) => {
    if (!input.ok) {
      return Response.json(input.issues, { status: 400 });
    }

    const user = await findUser(input.params.id);
    if (!user) {
      return Response.json({ error: "not found" }, { status: 404 });
    }

    return Response.json(user);
  },
});

Two tiny changes. We made the resolve arrow function async, and we added await in front of findUser. Hectoday handlers can be either synchronous or asynchronous, and the framework awaits for you if you return a Promise<Response>. That is it for the server change.

Step 5: Try it at a thousand users

Seed a small file and start the server.

node seed.ts 1000 users.jsonl
npm run dev

In another terminal, grab an id and hit the endpoint.

ID=$(jq -r '.[0]' ids.json)
curl http://localhost:8081/users/$ID

You should get a JSON user record back. Try with a fake id too.

curl -i http://localhost:8081/users/nope

That one returns a 404. The handler called findUser("nope"), scanned every line of the 1,000-record file, found nothing, returned null, and the handler turned that into a 404.

Create a new user with a POST.

curl -X POST http://localhost:8081/users \
  -H 'Content-Type: application/json' \
  -d '{"name":"New User","email":"[email protected]"}'

You should get the new user back with status 201. Open users.jsonl in your editor and scroll to the bottom. You should see the new line there. The file is now the canonical source of truth for the record you just created.

Step 6: What is happening on every request

Let us make sure the shape of this is clear. For every GET /users/:id, here is the work:

  1. Open users.jsonl for reading.
  2. Start streaming it, one line at a time.
  3. Parse each line as JSON.
  4. Compare that line’s id to the one we are looking for.
  5. If we find a match, stop. If we do not, keep going until the end of the file.

This is O(n), which is computer science shorthand for “the cost grows in direct proportion to the size of the input.” If you have 10,000 records and the user is near the end, you parse 10,000 JSON objects to find them. If the user does not exist, you parse every line. On average, across a random read pattern, you end up parsing half the file.

The OS page cache helps a bit. If the file is small enough to stay in RAM (the kernel keeps recently-read file pages in memory automatically), you skip the actual disk read. But you still pay the per-line decoding and parsing cost on every request. That cost is doing real work even with a warm cache.

Why this is not as bad as you think, yet

There is something honest about this approach. There are no indexes to keep in sync. No in-memory state to rebuild on restart. No cache invalidation bugs. If the process dies halfway through a write, the file is either intact or has one truncated line at the end that we skip because of the if (!line.trim()) continue check. When we restart the server, there is nothing to reload. The file already contains everything we know.

For a script, for an internal tool, for a command-line program that processes a small file, this is often the right answer. You should not reach for a database to look up entries in a 1,000 line config file.

Step 7: Watch it fall over at scale

Time to see the limit. Regenerate the data at a million records.

node seed.ts 1000000 users.jsonl

The server should auto-restart thanks to npm run dev. Now hit the last user in the file.

curl http://localhost:8081/users/$(jq -r '.[999999]' ids.json)

Measure it. That single request takes around a second on most machines. Then imagine fifty users hitting that endpoint at the same instant. Fifty seconds of CPU work spread across fifty connections. The page freezes for everyone.

The numbers

Here is what we actually measured. Apple Silicon Mac mini, Node 24.

RecordsRequests/secAvg latency
10k474101ms
100k49914ms
1M51.06s

At 10,000 records, the server handles 474 requests per second under load. That is not nothing. For a lot of apps, that number would be fine forever.

Add a zero to the dataset and the throughput collapses. At 100,000 records we are down to 49 req/s. At a million, we are at 5. Average latency at that size is over a full second. For users of your app, that means the page is frozen while the spinner spins.

The drop is almost perfectly linear. Ten times the data, ten times slower. That is exactly what you would expect from an O(n) algorithm.

What do you think would happen if you ran the 1M test a second time, immediately after the first? It would be a little faster, because the OS now has the whole file in its page cache. But the parsing cost is still there, line by line. The caching only saves the disk I/O, not the work of reading and decoding 500,000 JSON objects on every request.

At 100,000 records, the linear-scan server averages close to a second per request. Where is most of that time spent?

When linear scan is the right answer

Despite the collapse at scale, there are real cases where this approach is correct.

If the file is small (a few thousand records) and read traffic is low (a handful of requests per minute, not per second), linear scan is fine. The simplicity is worth more than the throughput you are giving up. There is nothing to maintain, nothing to load at startup, nothing to reason about beyond “open file, scan, close.”

Internal admin tools, configuration files, lookup tables, scripts that run once a day. All of these can use linear scan happily forever. This is, by the way, exactly how a lot of CLI tools and small services work in production.

The trap is keeping this approach past its scale point because “it works on my laptop with 100 records.” It does work on your laptop with 100 records. It will not work in production with 100,000.

In the next lesson we make one small change. We load the file into memory once at startup and keep it in a Map. The throughput jumps by roughly four orders of magnitude, and the numbers become genuinely hard to believe.

← The Setup Approach 2: In-Memory Map →

© 2026 hectoday. All rights reserved.