hectoday
DocsCoursesChangelog GitHub
DocsCoursesChangelog GitHub

Access Required

Enter your access code to view courses.

Invalid code

← All courses Do You Need a Database?

Storage Fundamentals

  • A Database is Just Files
  • The Setup
  • Approach 1: Linear Scan
  • Approach 2: In-Memory Map
  • Approach 3: Binary Search on Disk
  • SQLite as a Baseline
  • Benchmarking with wrk
  • Reading the Numbers
  • When You Actually Need a Database
  • Quiz: Storage Fundamentals (wip)

Writes and Durability

  • The Write Path
  • Append Throughput
  • Writes Break the Index
  • Concurrent Writers
  • Atomic Multi-Record Writes

The write path

Welcome to section 2. We spent section 1 benchmarking reads. Every approach had a createUser function that called appendFileSync, but we did not pay much attention to how that function actually works. We just trusted that if the function returned without throwing, the data was safe on disk.

It turns out that is not quite true. There is a surprisingly wide gap between “the function returned” and “the bytes are durably on disk,” and understanding that gap is the foundation of everything else in section 2. Before we can benchmark write performance or talk about transactions or explain why databases exist, we need to know what appendFileSync is really doing.

What appendFileSync actually does

When you call appendFileSync("users.jsonl", line) in Node, here is the sequence of events:

  1. The runtime makes a write(2) syscall, handing your bytes to the kernel.
  2. The kernel copies those bytes into something called the page cache, which is a region of RAM that the operating system uses to buffer file I/O.
  3. The syscall returns. Your code moves on to the next line.
  4. At some later moment (milliseconds, seconds, sometimes longer) the kernel decides to flush “dirty” pages of the cache to the actual storage device.
  5. The storage device receives the bytes and writes them into its own internal cache. Most modern SSDs have one.
  6. Eventually, the SSD commits those bytes to flash.

Pay attention to step 3. appendFileSync returns as soon as the kernel accepts the bytes. Not as soon as they are on disk. If the machine loses power between step 3 and step 6, those bytes are gone. Your “synchronous” write has silently disappeared.

This surprises people the first time they hear it. The function is called appendFile_Sync_. The “sync” refers to “synchronous from the JavaScript perspective.” The function blocks until the syscall returns. It does not mean “synchronously written to durable storage.” Those are different things, and the name does not help.

What do you think would happen if you called appendFileSync and then immediately pulled the power cord on your laptop? Any bytes that had only reached step 3 would be lost. Bytes that had made it to step 5 or beyond would usually survive, depending on whether the SSD has power-loss protection. In practice, on modern hardware, most of this data does make it to disk within a second or two. But “most” and “always” are not the same word.

What fsync does

To force your bytes through the kernel and into durable storage, you have to call fsync(2). In Node this is fsyncSync(fd), available through node:fs.

Here is the version of createUser that actually persists durably:

import { openSync, writeSync, fsyncSync, closeSync } from "node:fs";

export function createUser(name: string, email: string): User {
  const user: User = {
    id: randomUUID(),
    name,
    email,
    created_at: new Date().toISOString(),
  };

  const line = JSON.stringify(user) + "\n";
  const fd = openSync(USERS_FILE, "a");
  try {
    writeSync(fd, line);
    fsyncSync(fd); // <-- this is the durability barrier
  } finally {
    closeSync(fd);
  }

  return user;
}

Let us walk through what changed.

We open the file with openSync(USERS_FILE, "a") and get back a file descriptor fd. The "a" flag means “append mode.” Any bytes we write will land at the current end of the file.

writeSync(fd, line) pushes the bytes into the kernel. Exactly like appendFileSync would, except we still have the file descriptor around.

Now the new part. fsyncSync(fd) is a system call that says “do not return until every byte associated with this file descriptor has been confirmed written to durable storage.” The kernel flushes its dirty pages, the SSD confirms they reached non-volatile memory, and only then does fsyncSync return.

Finally we closeSync(fd) in a finally block so the file descriptor gets released even if something threw.

Now for the bad news. fsyncSync is slow. Really slow. A consumer SSD typically takes 1 to 10 milliseconds per fsync. A spinning-rust hard drive is much worse (tens of milliseconds). An enterprise NVMe with a battery-backed cache can be closer to 100 microseconds. Compare that to an unsynced write, which is usually a few microseconds at most.

That is a difference of roughly three orders of magnitude. A thousand times slower. Every single write now pays that cost. This is the entire reason databases have any complexity around writes.

What you actually risk without fsync

Without fsync, the data the OS has accepted but not yet flushed lives in the kernel’s page cache. Here is what the page cache does and does not survive.

The page cache survives:

  • Your process crashing.
  • Your process being killed by kill -9.
  • The Node runtime panicking.
  • An exception bubbling up through your handler.

In all of those cases, the OS is still running. The kernel still has the bytes. They will eventually be flushed to disk on the kernel’s own schedule.

The page cache does not survive:

  • The kernel crashing.
  • A power loss.
  • Someone yanking the machine from the wall.
  • Some kinds of hypervisor failures.

Modern operating systems are very stable. Modern hardware fails rarely. So in practice, even without fsync, you usually do not lose data. The important word there is “usually.”

For a personal blog, “usually” is fine. You lose the last few comments, you move on. For a payment system, “usually” is a lawsuit. For most things in between, the honest question is: what would you do if the last few seconds of your writes disappeared? If the answer is “recover them from an upstream system” or “the user will just retry,” you have a lot of flexibility. If the answer is “we cannot recover them and a customer just got charged,” you need fsync.

The hierarchy of durability

Every production system lives somewhere on this table.

OperationSurvivesPerformance
In-memory onlyProcess restart? No.Nanoseconds
write() to file, no fsyncProcess crash? Yes. Power loss? NoMicroseconds
write() + fsync()Power loss? Yes (mostly)1-10ms
write() + fsync() + replicated to 2nd machineSingle-machine failure? Yes5-50ms
Geographically replicatedDatacenter loss? Yes50-500ms

Each step down the table costs roughly 10x to 100x more in latency. Each one survives a more catastrophic failure than the one above.

When database engineers talk about “durability tradeoffs,” this is the table they are talking about. Postgres, by default, calls fsync on every transaction commit. SQLite in WAL mode batches and fsyncs the WAL file. Redis has a setting called appendfsync everysec that fsyncs once per second, accepting up to one second of data loss in exchange for roughly 1000x more throughput than per-write fsync.

There is no universally correct answer. There is only “what does my product need to survive.” And different parts of a single product often need different answers. A logging pipeline can tolerate a few seconds of loss. A payment ledger cannot tolerate one.

What our four implementations are doing right now

Let us audit the section 1 code against this.

Linear scan and in-memory map. Both call appendFileSync(USERS_FILE, line). This goes through the kernel’s write(2) but does not call fsync. A power loss in the seconds after a write would silently lose those records. The map version has the data in memory too, but a process crash also takes the memory with it. The file is at least more durable than the map.

Binary search. Same story for data file appends. Its index gets rebuilt periodically, so it has its own separate durability story. The index can always be reconstructed from the data file, so as long as the data file is intact, you are fine.

SQLite. We explicitly turned on WAL mode with PRAGMA journal_mode = WAL. WAL mode fsyncs the WAL file periodically and on transaction commit, so SQLite gives you durability against power loss out of the box. That durability is exactly what you are paying for in the throughput numbers.

This means our flat-file implementations are, technically, a little bit faster than they should be. They are skipping work that SQLite is actually doing. When we benchmark writes in the next lesson, we are going to measure both: flat files with and without fsync, so you can see the real cost of the durability guarantee.

A server uses appendFileSync to write a log line for every request. The machine loses power. After it reboots, the last few seconds of log lines are missing from the file, even though the function returned successfully for those requests. Why?

Why this matters for benchmarks

When you read “SQLite handles 50,000 reads per second” from our section 1 benchmarks, you might assume it can also handle 50,000 writes per second. It cannot, and the reason is fsync.

SQLite is doing real durability work on every commit. Durability has a hard floor set by your storage device. A typical NVMe SSD can do somewhere between 10,000 and 50,000 fsyncs per second. A consumer SATA SSD is closer to 1,000 to 5,000. A spinning disk is in the low hundreds. Whatever your storage hardware can do is the absolute upper bound on durable writes per second, no matter what software is running on top.

This is also why batching matters so much for write throughput. If you can fsync once per thousand writes instead of once per write, you have effectively raised your write ceiling by a factor of a thousand. That is exactly what database transactions buy you when you insert in bulk, and it is the single biggest performance lever in the next lesson.

In the next lesson we measure all of this. JSONL appends without fsync. JSONL appends with fsync. SQLite inserts one at a time. SQLite inserts batched inside a transaction. The numbers span about four orders of magnitude, and they explain most of what makes a database different from a file.

← Quiz: Storage Fundamentals (wip) Append Throughput →

© 2026 hectoday. All rights reserved.