A database is just files

Before you reach for Postgres on the first day of a new project, it is worth asking: do you actually need a database yet? Most of us answer “yes” by reflex, because that is what every tutorial does. In this course we are going to answer that question with numbers instead of reflex. We will build the same small API four different ways, put them under real load, and see what each approach actually gives you. By the end you will know where the line is, for your own projects, between “a file on disk is fine” and “you need a real database.”

Let us start with a claim that sounds wrong the first time you hear it.

A database is just files.

SQLite is a single file on disk. Postgres is a directory full of files with a process sitting in front of them. Every database you have ever used is, under the hood, reading and writing bytes to the filesystem. It is using the same open(), read(), and write() calls your code uses when you do readFileSync("users.json").

So the question is not whether to use files. You are always using files. The question is whether to use a database’s files or your own. For a surprising number of applications, especially early-stage ones, the answer might be: your own.

The problem most tutorials skip over

Here is the situation we keep ending up in. You are building a small app. Maybe a bookmarking tool, maybe an internal dashboard, maybe the first version of a SaaS. You have a list of users, a list of products, a list of whatever. You need to look things up by id. That is it. Nothing fancy.

Every tutorial at this point tells you to install Postgres, write a schema, run migrations, set up a connection pool, and wire up an ORM. Now you have a database server to run in development. A docker-compose file. A migration tool. A connection string in an env variable. A new failure mode if the database is not running when your app starts.

But your actual requirement was “I need to find a user by id.” A Map<string, User> can do that. A file full of JSON can do that. You may have just taken on a mountain of operational complexity to solve a problem that was already solved by the standard library.

The goal of this course is to show you, with measurements, when that tradeoff is actually worth making.

What we will build

The course is in two sections.

Section 1 is about reads. We will build the same GET /users/:id endpoint four times. Same routes, same data, same @hectoday/http scaffold. The only thing that changes is the storage layer underneath:

Linear scan. Open the file on every request and read it line by line until we find the one we want.
In-memory map. Load the file once at startup, put every record into a Map, and serve lookups from memory.
Binary search on disk. Sort the data, build a small index file next to it, and do O(log n) lookups straight from disk without ever loading everything into RAM.
SQLite. A real database with a B-tree index, living in a single file.

After each one, we will hammer the server with wrk to measure real throughput and latency. Then we will translate those numbers into something you can actually reason about, like how many daily active users each approach can support.

Section 2 is about writes. Same scaffold, but we shift the focus. How durable is appendFileSync really? (Less than you think.) How much does fsync cost? (About a thousand times what you think.) What happens to a sorted index when you append new records? (It breaks.) What happens when two processes write to the same file at the same time? (Nothing good.) By the end of section 2 you will understand why ACID transactions are the one thing flat files genuinely cannot give you.

What you will walk away with

A working mental model of the cost of each approach. Loading every user into a Map sounds wasteful until you see what the numbers actually look like. Linear scan sounds pointless until you see that it still handles a few hundred requests per second on a 10,000 record file, which is more than most production apps ever actually need. SQLite sounds heavy until you see that it gives you SQL queries, joins, and transactions for a small constant-factor cost compared to the raw Map.

These are not opinions. They are measurements you can reproduce on your own laptop.

A note on the numbers

Every benchmark table in this course was measured on the same machine (Apple Silicon Mac mini, Node 24) using the Hectoday server you are about to build. The benchmarking lesson walks you through running the tests yourself. The shape of the results should reproduce cleanly on any modern laptop. The absolute numbers will move by a factor of a few depending on your CPU, your SSD, and whatever else your machine is doing while you run the tests. That is fine. The comparisons between approaches are what matter.

Why @hectoday/http

We will use @hectoday/http as the framework throughout. Three reasons.

First, it is built on Web Standards. Every handler returns a Response. Every request is a Request. The framework gets out of the way so you can focus on the storage logic, which is what this course is actually about.

Second, the app.fetch entry point is just a function. That means we can hand it straight to a tiny server adapter like srvx and be up and running in three lines, and we can also call app.fetch directly in tests without binding to a port.

Third, the setup({ routes }) API is small enough that you can hold the whole framework in your head. That frees up attention for the actual topic: storage.

What you need to follow along

Node.js 22 or later (Node 24 is what these benchmarks were run on)
wrk for benchmarking (brew install wrk on macOS)
A terminal

That is it. No Postgres, no Docker, no cloud account, no bundler. Everything runs locally and writes to a single directory you can rm -rf when you are done.

Which of these statements is true?

How to read this course

Each lesson builds on the one before it. The Hectoday server scaffold we set up in the next lesson is the same one we use for all four storage approaches. Only the storage code changes between lessons.

If you already know @hectoday/http, the next lesson will move quickly. If you do not, it is a gentle introduction to the parts of the API we will use throughout: route.get, route.post, Zod request schemas, and setup({ routes }).

In the next lesson we set up the project, define the two endpoints we will use for the rest of section 1, and write a small script to generate fake users so we have something to benchmark against.

Access Required