Caching from First Principles

A guide for developers who have never implemented caching. Every example uses @hectoday/http, but the ideas apply everywhere.

What caching actually is

Your server does work. Sometimes the same work, for the same input, over and over. Caching means saving the result of expensive work so you can skip doing it again.

Without cache:
  Request → compute → Response    (50ms)
  Request → compute → Response    (50ms)
  Request → compute → Response    (50ms)

With cache:
  Request → compute → store → Response    (55ms)
  Request → read store → Response          (2ms)
  Request → read store → Response          (2ms)

The first request is slightly slower (you compute and store). Every subsequent request is dramatically faster (you skip the computation entirely).

That's it. Everything else is details about where to store things, when to throw them away, and how to avoid serving stale data.

The two requirements

Caching is only useful when both of these are true:

The original operation is expensive (database query, API call, heavy computation)
The same result will be needed again soon

If the operation is cheap, a cache adds overhead for no benefit. If the result is never reused (every request is unique), the cache never helps.

Your first cache

Start with a handler that hits the database on every request:

route.get("/users/:id", {
  request: { params: z.object({ id: z.string().uuid() }) },
  resolve: async (c) => {
    if (!c.input.ok) {
      return Response.json({ error: c.input.issues }, { status: 400 });
    }

    const user = await db.users.get(c.input.params.id);
    if (!user) {
      return Response.json({ error: "Not found" }, { status: 404 });
    }

    return Response.json(user);
  },
});

If this endpoint gets 1,000 requests per second for the same user, that's 1,000 identical database queries. Add a cache:

const cache = new Map<string, { data: unknown; expiresAt: number }>();

async function cached<T>(key: string, ttlMs: number, fn: () => Promise<T>): Promise<T> {
  const entry = cache.get(key);
  if (entry && entry.expiresAt > Date.now()) return entry.data as T;

  const result = await fn();
  cache.set(key, { data: result, expiresAt: Date.now() + ttlMs });
  return result;
}

Now use it:

route.get("/users/:id", {
  request: { params: z.object({ id: z.string().uuid() }) },
  resolve: async (c) => {
    if (!c.input.ok) {
      return Response.json({ error: c.input.issues }, { status: 400 });
    }

    const user = await cached(`user:${c.input.params.id}`, 60_000, () =>
      db.users.get(c.input.params.id),
    );

    if (!user) {
      return Response.json({ error: "Not found" }, { status: 404 });
    }

    return Response.json(user);
  },
});

1,000 requests per second, but only the first one hits the database. The next 999 return the cached result in microseconds.

How the cache function works

Walk through cached() line by line:

async function cached<T>(key: string, ttlMs: number, fn: () => Promise<T>): Promise<T> {

It takes a key (a unique string for this piece of data), a TTL (time to live in milliseconds), and a function that computes the actual value.

const entry = cache.get(key);
if (entry && entry.expiresAt > Date.now()) return entry.data as T;

Look up the key. If it exists and hasn't expired, return it immediately. This is a cache hit. No computation happens.

const result = await fn();
cache.set(key, { data: result, expiresAt: Date.now() + ttlMs });
return result;

If it doesn't exist or has expired, this is a cache miss. Run the expensive function, store the result with an expiration time, and return it.

Check, compute, store, return. Every cache you'll ever write follows this pattern.

Cache keys

The key uniquely identifies the cached data. Get it wrong and you serve the wrong data to the wrong request.

// One user
cached(`user:${id}`, ...)

// A list with pagination
cached(`users:page=${page}&limit=${limit}`, ...)

// Something that depends on the caller
cached(`dashboard:${userId}`, ...)

The rule: if two requests should return different data, they need different keys. If they should return the same data, they need the same key.

TTL: when cached data expires

TTL (time to live) controls how long cached data is considered valid.

// Cache for 60 seconds
cached("stats", 60_000, () => computeStats());

// Cache for 5 minutes
cached(`user:${id}`, 300_000, () => db.users.get(id));

// Cache for 1 hour
cached("config", 3_600_000, () => loadConfig());

Short TTL means data is fresh but you compute more often. Long TTL means less computation but data can be stale. There's no universal right answer. It depends on how fast your data changes and how much staleness your users tolerate.

Cache misses are slower than no cache

This is the most important thing to understand. When the cache doesn't have the data (a miss), your request is slower than if you had no cache at all:

No cache:       compute                    → 50ms
Cache hit:      check cache                → 2ms
Cache miss:     check cache + compute + store → 55ms

A miss does everything the uncached path does, plus the overhead of checking and writing the cache. This means caching is only helpful when most requests are hits.

If your cache hit rate is 10%, you've made 90% of requests slower for a 10% improvement. That's a net loss.

Cache invalidation

The hardest part of caching. When data changes, the cache still has the old version.

// User updates their name
route.put("/users/:id", {
  request: {
    params: z.object({ id: z.string().uuid() }),
    body: z.object({ name: z.string(), email: z.string().email() }),
  },
  resolve: async (c) => {
    if (!c.input.ok) {
      return Response.json({ error: c.input.issues }, { status: 400 });
    }

    const user = await db.users.update(c.input.params.id, c.input.body);

    // Delete the cached version so the next read gets fresh data
    cache.delete(`user:${c.input.params.id}`);

    return Response.json(user);
  },
});

When data changes, delete the cache entry. Don't try to update it. Updating means you're replicating your mutation logic in two places (the database write and the cache update). Any drift between them creates bugs.

Delete the entry. The next read recomputes it from the database. Simple, correct, boring.

HTTP caching: let the client do the work

Everything above is server-side caching. But there's a whole other layer: telling the client (browser, CDN, reverse proxy) to cache the response itself.

This happens through the Cache-Control header:

return Response.json(user, {
  headers: {
    "cache-control": "public, max-age=60",
  },
});

This tells the client: "this response is cacheable by anyone, and it's valid for 60 seconds." For the next 60 seconds, the client doesn't even make a request to your server. Zero load.

Common values:

// Anyone can cache for 60 seconds
"cache-control": "public, max-age=60"

// Only the user's browser can cache (not shared CDNs)
"cache-control": "private, max-age=60"

// Don't cache at all
"cache-control": "no-store"

// Cache it, but always check with the server before using
"cache-control": "public, max-age=0, must-revalidate"

// Serve stale while fetching fresh in the background
"cache-control": "public, max-age=60, stale-while-revalidate=30"

Set default cache headers globally with onResponse:

const app = setup({
  routes: [...],

  onResponse: ({ request, response }) => {
    const headers = new Headers(response.headers);

    // Default: don't cache. Handlers opt in by setting their own header.
    if (!headers.has("cache-control")) {
      headers.set("cache-control", "no-store");
    }

    return new Response(response.body, { status: response.status, headers });
  },
});

ETags: "has it changed?"

Instead of guessing with TTL, ETags let the client ask: "I have version X. Has the data changed?"

An ETag is a fingerprint of the response body. If the body hasn't changed, the ETag is the same, and you can respond with 304 Not Modified and no body.

async function jsonWithETag(body: unknown, request: Request): Promise<Response> {
  const json = JSON.stringify(body);
  const digest = await crypto.subtle.digest("SHA-256", new TextEncoder().encode(json));
  const etag = `"${Array.from(new Uint8Array(digest))
    .map((b) => b.toString(16).padStart(2, "0"))
    .join("")}"`;

  // Client sent their version. If it matches, nothing changed.
  if (request.headers.get("if-none-match") === etag) {
    return new Response(null, { status: 304 });
  }

  return new Response(json, {
    headers: {
      "content-type": "application/json",
      etag: etag,
      "cache-control": "public, max-age=0, must-revalidate",
    },
  });
}

The flow:

Client requests /users/123
Server responds with the data and ETag: "abc123"
Client caches the response locally
Client requests /users/123 again with If-None-Match: "abc123"
Server checks: same ETag? Respond 304. Different? Send new data.

The server still runs the query (to compute the ETag), but you save bandwidth by not sending the body when nothing changed. Combine with server-side caching to also skip the query.

Two layers working together

The most effective setup combines server-side and client-side caching:

route.get("/users/:id", {
  request: { params: z.object({ id: z.string().uuid() }) },
  resolve: async (c) => {
    if (!c.input.ok) {
      return Response.json({ error: c.input.issues }, { status: 400 });
    }

    // Layer 1: server-side cache (skip the database)
    const user = await cached(`user:${c.input.params.id}`, 60_000, () =>
      db.users.get(c.input.params.id),
    );

    if (!user) {
      return Response.json({ error: "Not found" }, { status: 404 });
    }

    // Layer 2: HTTP cache (skip the network)
    return Response.json(user, {
      headers: {
        "cache-control": "public, max-age=30, stale-while-revalidate=30",
      },
    });
  },
});

For the first 30 seconds, the client doesn't contact the server at all. After that, the client revalidates, but the server responds from its in-memory cache instead of hitting the database. The database is only queried once per minute at most.

When to use Redis instead of a Map

The in-memory Map works for a single server instance. Its limits:

If you restart the server, the cache is gone (cold start)
If you run two server instances, each has its own cache (no shared state)
If cached data grows too large, your server runs out of memory (no eviction)

Redis solves all three. It's a separate process that persists across restarts, is shared across instances, and has built-in eviction policies.

Swap the storage, keep the same cached() signature:

import { Redis } from "ioredis";
const redis = new Redis();

async function cached<T>(key: string, ttlMs: number, fn: () => Promise<T>): Promise<T> {
  const hit = await redis.get(key);
  if (hit) return JSON.parse(hit) as T;

  const result = await fn();
  await redis.set(key, JSON.stringify(result), "PX", ttlMs);
  return result;
}

Your handlers don't change. Only the storage backend does.

Use a Map for: single instance, small data, development, side projects. Use Redis for: multiple instances, production, data that must survive restarts.

Bounded caches: LRU eviction

An unbounded Map cache grows forever. If you cache 10 million unique keys, you use 10 million entries of memory. Eventually your server crashes.

LRU (Least Recently Used) eviction keeps the cache bounded. When it's full, it removes the entry that hasn't been accessed in the longest time:

class LRUCache<T> {
  private map = new Map<string, { data: T; expiresAt: number }>();

  constructor(private maxSize: number) {}

  get(key: string): T | null {
    const entry = this.map.get(key);
    if (!entry) return null;
    if (entry.expiresAt < Date.now()) {
      this.map.delete(key);
      return null;
    }
    // Move to end (most recently used)
    this.map.delete(key);
    this.map.set(key, entry);
    return entry.data;
  }

  set(key: string, data: T, ttlMs: number): void {
    if (this.map.has(key)) this.map.delete(key);
    if (this.map.size >= this.maxSize) {
      const oldest = this.map.keys().next().value;
      this.map.delete(oldest);
    }
    this.map.set(key, { data, expiresAt: Date.now() + ttlMs });
  }
}

This works because JavaScript's Map preserves insertion order. Deleting and re-inserting on access moves the entry to the end. The first key is always the least recently used.

Use it the same way:

const store = new LRUCache<unknown>(10_000);

async function cached<T>(key: string, ttlMs: number, fn: () => Promise<T>): Promise<T> {
  const hit = store.get(key) as T | null;
  if (hit !== null) return hit;

  const result = await fn();
  store.set(key, result, ttlMs);
  return result;
}

Memory stays bounded at 10,000 entries. Popular data stays. Unpopular data gets evicted.

When not to cache

Caching is harmful when:

The operation is already fast. Caching a 1ms lookup to save 1ms adds complexity for no gain.
Data changes constantly. If the cached value is stale within seconds, the TTL is so short the hit rate is near zero.
Every request is unique. If the key space is enormous and rarely repeats (search queries, UUIDs), most requests are misses.
Correctness matters more than speed. Financial transactions, inventory counts, anything where stale data causes real harm.

A cache miss is slower than no cache. If your hit rate is low, you're making the average request slower.

Summary

Concept	What it means
Cache hit	Data found in cache, skip computation
Cache miss	Data not found, compute and store
TTL	How long before cached data expires
Invalidation	Deleting stale cache entries when data changes
Cache-Control	HTTP header telling clients how to cache
ETag	Fingerprint of a response for conditional requests
LRU	Eviction policy: remove least recently used when full
Hit rate	Percentage of requests served from cache

The pattern is always the same: check, compute, store, return. Everything else is choosing where to store, how long to keep it, and when to throw it away.