Approach 2: in-memory map
Our linear scan works, but it has a clear problem. Every request re-reads and re-parses half the file on average. At a million records that is half a million JSON objects per request, and the numbers showed us what that costs.
Here is the question that fixes it. Why are we doing all this work on every request? We know the file. We read it just a moment ago. Why not read it once, keep the whole thing in memory, and look up users from there?
That is this lesson. The change is almost laughably small. The throughput goes up by roughly four orders of magnitude at a million records.
The idea
When the server starts, we load the entire file into memory and put every record into a Map. A Map in JavaScript is a hash table, which means looking up a value by its key is effectively constant time. It does not care if the map has ten entries or ten million.
From that moment on, findUser(id) is just users.get(id). One operation.
Writes get a small complication. When we create a new user, we have to update two things: the file on disk (so the data survives a restart) and the map in memory (so the next read sees the new user). The file stays the source of truth. The map is a derived index we rebuild on every startup.
Where we are starting from
In the last lesson we seeded users.jsonl (with 1,000 records, and then possibly 1,000,000 when we stress-tested). We also wrote a linear-scan src/store.ts and updated the handler in src/server.ts to await its async findUser.
All of that state carries over. The users.jsonl file is still on disk, still valid JSONL. We are not going to re-seed it for the functional part of this lesson; the new store just reads whatever is already there.
What is going to change:
src/store.tsgets replaced (it is short, we will rewrite the whole file).src/server.tsloses theasync/awaitwe added for linear scan, because the newfindUseris synchronous again.- Nothing else.
If your dev server from the previous lesson is still running under npm run dev, leave it. Node’s --watch flag will restart the process every time we save src/store.ts, so you can just keep the server tab open and watch it reload after each step.
Step 1: Clear out the old store and write the imports
Open src/store.ts. It currently holds the linear-scan implementation (async findUser, the createReadStream/readline loop, and a simple appendFileSync for createUser). Delete the entire file contents and start fresh.
// src/store.ts
import { readFileSync, appendFileSync, existsSync } from "node:fs";
import { randomUUID } from "node:crypto";
const USERS_FILE = "users.jsonl";
export interface User {
id: string;
name: string;
email: string;
created_at: string;
} Two things changed from the linear-scan imports.
readFileSync is new. It reads the entire file into a single string in one blocking call. That is exactly what we want at startup: block until the file is fully in memory, then continue.
createReadStream and createInterface are gone, because we are not reading line by line any more. That also means we can drop the node:readline import entirely.
existsSync, appendFileSync, and randomUUID are the same helpers we used in the linear scan version. USERS_FILE and the User interface are unchanged.
Save the file. The dev server will restart, then immediately crash with a missing-export error, because we have not defined findUser or createUser yet. That is fine. We will fix it in the next step.
Step 2: Write the loader function
Add this below the User interface.
function loadUsers(): Map<string, User> {
const map = new Map<string, User>();
if (!existsSync(USERS_FILE)) return map;
const contents = readFileSync(USERS_FILE, "utf8");
for (const line of contents.split("\n")) {
if (!line.trim()) continue;
const user = JSON.parse(line) as User;
map.set(user.id, user);
}
return map;
} Walk through it.
new Map<string, User>() creates an empty hash table. The <string, User> part tells TypeScript that keys are strings (user ids) and values are full User objects.
If the file does not exist yet (first run on a brand-new machine, no seed), we return the empty map. The server will still start, and findUser will return null for every id until something writes.
Otherwise we read the whole file into memory with readFileSync. The "utf8" option gives us a string instead of a Buffer.
Then we split on \n and walk every line. Empty lines get skipped with continue. Every real line gets parsed as JSON and dropped into the map, keyed by the user’s id.
Return the map. The next step holds onto it.
This function does not run yet. It is just defined. The next step is what actually calls it.
Step 3: Call loadUsers at module load
Here is the key move. Add this line directly after the loadUsers function.
const users = loadUsers(); That one line is the whole trick. Node runs it exactly once, when it first imports src/store.ts. Since src/store.ts is imported by src/server.ts, and src/server.ts is the entry point, the very first thing that happens when you run npm run dev is: Node loads src/store.ts, which runs loadUsers(), which reads and parses the entire users.jsonl into a Map. Only after that does serve() start accepting requests.
From then on, users is a live Map sitting in memory for the lifetime of the process. Every subsequent findUser call is just a hash-table lookup on this users map.
Top-level code in an ES module runs once, synchronously, in import order. That is why we can use readFileSync here without worrying about timing: by the time serve() starts accepting requests, the map is already fully populated.
Step 4: Write findUser
The lookup function is now a one-liner. Add this below the const users = loadUsers(); line.
export function findUser(id: string): User | null {
return users.get(id) ?? null;
} users.get(id) returns the User if it is in the map, or undefined if not. We use the nullish coalescing operator (?? null) to turn undefined into null, so the return type lines up with what the handler expects.
This function is synchronous again. No async, no Promise. The async version we needed for linear scan is gone because there is no stream to wait on.
If your dev server is watching, it will restart as soon as you save. findUser now compiles, but createUser is still missing, so you will get an unresolved export error. Keep going.
Step 5: Write createUser
Writes need a little more care than the linear-scan version. We have to update both the file and the in-memory map. Add this at the bottom of src/store.ts.
export function createUser(name: string, email: string): User {
const user: User = {
id: randomUUID(),
name,
email,
created_at: new Date().toISOString(),
};
appendFileSync(USERS_FILE, JSON.stringify(user) + "\n");
users.set(user.id, user);
return user;
} Three things happen.
We build the user record in memory. Same as linear scan.
We append the JSON line to the file. Same as linear scan.
Then we add the user to the map with users.set(user.id, user). This is the new line. If we forget it, the file will have the new user but the map will not, and the very next findUser call will return null for a user that clearly exists on disk. The map would drift from the file, silently, until the next process restart fixed it.
The order matters. We write to the file first, and only update the map if the write succeeded. If the file write throws, we do not want to end up with a user in memory that was never persisted. (There is a real subtlety here about what “successfully written” actually means at the OS level. Section 2 digs into this. For now, the mental model “the file is more durable than the map” is the right one.)
At this point src/store.ts is complete. The dev server should reload cleanly and accept requests.
Step 6: Simplify the handler in src/server.ts
Open src/server.ts. The getUser handler currently looks like this (from the linear-scan lesson):
const getUser = route.get("/users/:id", {
request: {
params: z.object({ id: z.string() }),
},
resolve: async ({ input }) => {
if (!input.ok) {
return Response.json(input.issues, { status: 400 });
}
const user = await findUser(input.params.id);
if (!user) {
return Response.json({ error: "not found" }, { status: 404 });
}
return Response.json(user);
},
}); Our new findUser is synchronous, so we can drop the async and the await. Two small edits:
- Change
resolve: async ({ input }) => {toresolve: ({ input }) => {. - Change
const user = await findUser(...)toconst user = findUser(...).
The final handler:
const getUser = route.get("/users/:id", {
request: {
params: z.object({ id: z.string() }),
},
resolve: ({ input }) => {
if (!input.ok) {
return Response.json(input.issues, { status: 400 });
}
const user = findUser(input.params.id);
if (!user) {
return Response.json({ error: "not found" }, { status: 404 });
}
return Response.json(user);
},
}); This is the same handler we had right after the setup lesson. No promises anywhere. Only findUser’s internals changed.
postUser in the same file already used createUser synchronously, so it does not need to change at all.
Step 7: Try it
If the server is not already running, start it.
npm run dev You do not need to re-seed. The users.jsonl file from the last lesson is still there, and the new store loads from it just fine.
Watch the console. After npm run dev starts, there is a brief pause before the “server is listening” line appears. That pause is loadUsers() reading and parsing the file into the map. The bigger the file, the longer the pause.
Once the server is up, hit a user.
curl http://localhost:8081/users/$(jq -r '.[0]' ids.json) It returns instantly. If the file is still at 1M records from the linear-scan stress test, the same lookup that took around a second under linear scan is now sub-millisecond. We paid a startup cost, and in exchange every request afterwards is free.
Create a user and fetch it back to confirm the map-and-file write path works.
RESULT=$(curl -s -X POST http://localhost:8081/users \
-H 'Content-Type: application/json' \
-d '{"name":"New","email":"[email protected]"}')
NEW_ID=$(echo "$RESULT" | jq -r '.id')
curl http://localhost:8081/users/$NEW_ID The second curl should return the user you just created. The map has it because createUser called users.set. The file has it because createUser called appendFileSync. If you stopped the server right now and started it again, loadUsers would read the appended line and the new user would still be findable.
The numbers
Same machine, same server, same routes. The only thing we changed is store.ts.
| Records | Requests/sec | Avg latency |
|---|---|---|
| 10k | 66,573 | 741µs |
| 100k | 65,466 | 733µs |
| 1M | 72,074 | 711µs |
Take a moment with this table. Throughput is essentially flat from 10,000 records to 1,000,000. The swing between the three rows is within run-to-run noise on this hardware. Map at one million looks the same as map at ten thousand.
Compare to linear scan, which went from 474 req/s at 10k down to 5 req/s at 1M. Our new version is roughly fourteen thousand times faster at a million records, and it still does not break a sweat. Latency is sub-millisecond at every scale.
What is going on? Map.get is O(1). It hashes the id, jumps directly to a slot, and returns the value. It does not scan. It does not care how many entries the map has. Ten items or ten million, the cost is basically the same: one hash, one pointer read.
All the work we used to do on every request now happens exactly once, at startup. We pay a few hundred milliseconds to load a million users into memory, and then every request afterwards is effectively free.
In the in-memory map approach, what happens when the process crashes and restarts?
What it costs
So what is the catch?
The main one is RAM. Every record lives in memory now. For our 1M user file with small records, that is a few hundred megabytes, which is trivial on any modern server. For ten million records, or for records with large fields (think articles with full text), you are looking at several gigabytes, and it stops being trivial. At that point you need a way to page data in and out of memory automatically, which is one of the main things a real database gives you.
The other cost is startup time. Loading and parsing a million JSON lines takes a few hundred milliseconds. That is fine for a server that runs for weeks at a time. If you are doing rolling deploys every few minutes or autoscaling aggressively, those startup costs add up. A ten million record file might take several seconds before the first request can be served.
A word on concurrency
The JavaScript runtime in Node is single-threaded. There is no race condition between concurrent reads on a Map, because there is no actual concurrency at the JavaScript level. One request reads the map, returns, then the next request starts. Everything is serialized through a single event loop.
For an HTTP server doing Map.get lookups and JSON serialization, single-threaded is more than enough. The throughput numbers we saw above come from a single Node process pegging a single CPU core. If you ever needed to go further, you would run multiple Node processes behind a load balancer. That introduces a new problem: two processes mean two in-memory maps that can drift apart. We will come back to that headache in section 2.
When this is the right answer
Whenever your dataset fits in RAM and you only need lookups by primary key, the in-memory map is genuinely the right answer.
That covers a surprising portion of real applications:
- A bookmarks app where each user has a few thousand bookmarks.
- An analytics dashboard cached from a daily ETL.
- A feature-flag service with a few thousand flags.
- An auth service with under a million active sessions.
- The early stage of almost any SaaS.
You give up SQL queries, joins, and the ability to share state across multiple processes. In return you get sub-millisecond reads at any scale that fits in memory, and a backing store that is just a file you can cat, grep, and version-control if you want.
Where it breaks
Three cases:
- The dataset stops fitting in RAM. You need to start paging. That is a database.
- You need to query by something other than id. The only fast lookup today is
users.get(id). If you need “find all users withemail LIKE '%@example.com'” or “users created in the last 7 days,” you would have to scan the whole map every time. You could maintain secondary maps likeemailToUserordateToUsers, but you are now building a query engine. - You need multiple writers. Two server processes each with their own in-memory map will diverge the moment one of them writes. The file would have both writes, but each process’s in-memory copy would be stale. We will dig into this in section 2.
For everything that is not on that list: load the file, build the map, serve at around 70,000 req/s with sub-millisecond latency, and move on.
In the next lesson we look at a middle ground. What if your data does not fit in RAM, but you still want fast lookups by id? That is what binary search on disk gives you, and it turns out to be surprisingly clever.