Rate Limiting from First Principles

A guide for developers who know they should limit requests but don't know how it works under the hood. Every example uses @hectoday/http, but the ideas apply to any server.

What rate limiting is

Your API can handle a certain number of requests per second. Beyond that, responses slow down, the database chokes, and eventually the server crashes. Rate limiting is the practice of rejecting excess requests before they cause damage.

It's a bouncer at the door. The venue has a capacity. Once it's full, new people wait or go home. Without a bouncer, the venue gets dangerously overcrowded.

Under the limit:
  Request → Handler → 200 OK

Over the limit:
  Request → Rate limiter → 429 Too Many Requests

The handler never runs. The expensive work never happens. The server stays healthy.

Why you need it

Three reasons:

Protection from abuse. A single client sending 10,000 requests per second can take down your API for everyone. Rate limiting caps the damage one client can do.

Fair usage. Without limits, one aggressive client consumes all your resources. Other clients get slow responses or timeouts. Limits ensure everyone gets a fair share.

Cost control. Every request costs compute, bandwidth, and database queries. If your API is public, unlimited access means unlimited cost.

The simplest rate limiter

Count requests per client. If the count exceeds the limit within a time window, reject.

const requests = new Map<string, { count: number; resetAt: number }>();

function rateLimit(request: Request, limit = 100, windowMs = 60_000): true | Response {
  const ip = request.headers.get("x-forwarded-for") ?? "unknown";
  const now = Date.now();

  const entry = requests.get(ip);
  if (!entry || now > entry.resetAt) {
    requests.set(ip, { count: 1, resetAt: now + windowMs });
    return true;
  }

  entry.count++;
  if (entry.count > limit) {
    return Response.json(
      { error: "Too many requests" },
      {
        status: 429,
        headers: {
          "retry-after": String(Math.ceil((entry.resetAt - now) / 1000)),
        },
      },
    );
  }

  return true;
}

This is a fixed window counter. It allows 100 requests per 60-second window per IP address. When the window expires, the count resets.

Using it in a handler

Same pattern as auth. A function that returns true or a Response. Check with instanceof:

route.post("/users", {
  request: { body: CreateUser },
  resolve: async (c) => {
    const limited = rateLimit(c.request);
    if (limited instanceof Response) return limited;

    const caller = authenticate(c.request);
    if (caller instanceof Response) return caller;

    if (!c.input.ok) {
      return Response.json({ error: c.input.issues }, { status: 400 });
    }

    const user = await db.users.create(c.input.body);
    return Response.json(user, { status: 201 });
  },
});

Two lines. Rate limit check goes before auth because you don't want to spend time verifying tokens for clients that are already over the limit.

What to limit by

By IP address

The default. Every unique IP gets its own counter:

const ip = request.headers.get("x-forwarded-for") ?? "unknown";

Works for public APIs. The weakness: many users share an IP behind a corporate proxy or NAT. One user's abuse punishes everyone on that network.

By authenticated user

After authentication, limit by user ID instead of IP:

function rateLimitUser(userId: string, limit = 100, windowMs = 60_000): true | Response {
  const now = Date.now();
  const entry = requests.get(userId);

  if (!entry || now > entry.resetAt) {
    requests.set(userId, { count: 1, resetAt: now + windowMs });
    return true;
  }

  entry.count++;
  if (entry.count > limit) {
    return Response.json(
      { error: "Too many requests" },
      {
        status: 429,
        headers: {
          "retry-after": String(Math.ceil((entry.resetAt - now) / 1000)),
        },
      },
    );
  }

  return true;
}

resolve: async (c) => {
  const caller = authenticate(c.request);
  if (caller instanceof Response) return caller;

  const limited = rateLimitUser(caller.id, 50, 60_000);
  if (limited instanceof Response) return limited;

  // ...
};

More accurate than IP. Each user gets their own quota regardless of shared network.

By API key

For machine-to-machine APIs, limit by API key:

const key = request.headers.get("x-api-key") ?? "unknown";
const limited = rateLimit(key, 1000, 60_000);

Different keys can have different limits based on their plan.

Rate limiting algorithms

The simple counter above is a fixed window. It works, but it has a flaw. There are better algorithms.

Fixed window

The one we started with. Divide time into fixed intervals. Count requests in the current interval.

Window 1 (0:00-1:00): 100 requests allowed
Window 2 (1:00-2:00): 100 requests allowed

The flaw: A client can send 100 requests at 0:59 and 100 more at 1:01. That's 200 requests in 2 seconds, even though the limit is 100 per minute. The window boundary creates a burst.

When to use: Simple cases where occasional bursts are acceptable.

Sliding window

Instead of fixed boundaries, look at the last N seconds from right now. At any given moment, count requests in the trailing window.

const requests = new Map<string, number[]>();

function rateLimitSliding(key: string, limit: number, windowMs: number): true | Response {
  const now = Date.now();
  const timestamps = requests.get(key) ?? [];

  // Remove timestamps outside the window
  const recent = timestamps.filter((t) => now - t < windowMs);

  if (recent.length >= limit) {
    return Response.json({ error: "Too many requests" }, { status: 429 });
  }

  recent.push(now);
  requests.set(key, recent);
  return true;
}

No boundary bursts. At any point in time, there are at most limit requests in the last windowMs milliseconds.

The tradeoff: Memory. You store every request timestamp. For high-traffic APIs, this adds up. The fixed window stores one number per client. The sliding window stores an array.

When to use: When burst prevention matters and memory isn't a concern.

Sliding window counter

A compromise. Keep counters for the current and previous window. Estimate the sliding count using a weighted average:

function rateLimitSlidingCounter(key: string, limit: number, windowMs: number): true | Response {
  const now = Date.now();
  const currentWindow = Math.floor(now / windowMs);
  const windowProgress = (now % windowMs) / windowMs;

  const currentKey = `${key}:${currentWindow}`;
  const previousKey = `${key}:${currentWindow - 1}`;

  const currentCount = counters.get(currentKey) ?? 0;
  const previousCount = counters.get(previousKey) ?? 0;

  // Weighted estimate: full current window + fraction of previous
  const estimate = currentCount + previousCount * (1 - windowProgress);

  if (estimate >= limit) {
    return Response.json({ error: "Too many requests" }, { status: 429 });
  }

  counters.set(currentKey, currentCount + 1);
  return true;
}

If you're 70% through the current window, you count 100% of the current window's requests and 30% of the previous window's. This smooths out the boundary spike without storing every timestamp.

When to use: The best default for most APIs. Low memory, no boundary bursts, simple to implement.

Token bucket

Instead of counting requests, imagine a bucket that holds tokens. Each request takes a token. Tokens refill at a steady rate. If the bucket is empty, the request is rejected.

const buckets = new Map<string, { tokens: number; lastRefill: number }>();

function rateLimitTokenBucket(
  key: string,
  capacity: number,
  refillRate: number, // tokens per second
): true | Response {
  const now = Date.now();
  const bucket = buckets.get(key) ?? { tokens: capacity, lastRefill: now };

  // Refill tokens based on elapsed time
  const elapsed = (now - bucket.lastRefill) / 1000;
  bucket.tokens = Math.min(capacity, bucket.tokens + elapsed * refillRate);
  bucket.lastRefill = now;

  if (bucket.tokens < 1) {
    return Response.json({ error: "Too many requests" }, { status: 429 });
  }

  bucket.tokens -= 1;
  buckets.set(key, bucket);
  return true;
}

The token bucket allows bursts up to the bucket capacity, then enforces the steady rate. A bucket with capacity 10 and refill rate 1/second allows a burst of 10 requests, then 1 per second after that.

When to use: When you want to allow short bursts but enforce a sustained rate. Good for APIs where clients legitimately send batches of requests.

Response headers

Tell the client about their rate limit status:

function rateLimit(
  request: Request,
  limit = 100,
  windowMs = 60_000,
): Response | { remaining: number } {
  const ip = request.headers.get("x-forwarded-for") ?? "unknown";
  const now = Date.now();

  const entry = requests.get(ip);
  if (!entry || now > entry.resetAt) {
    requests.set(ip, { count: 1, resetAt: now + windowMs });
    return { remaining: limit - 1 };
  }

  entry.count++;
  if (entry.count > limit) {
    return Response.json(
      { error: "Too many requests" },
      {
        status: 429,
        headers: {
          "retry-after": String(Math.ceil((entry.resetAt - now) / 1000)),
          "x-ratelimit-limit": String(limit),
          "x-ratelimit-remaining": "0",
          "x-ratelimit-reset": String(Math.ceil(entry.resetAt / 1000)),
        },
      },
    );
  }

  return { remaining: limit - entry.count };
}

Add the headers to successful responses too:

resolve: async (c) => {
  const result = rateLimit(c.request);
  if (result instanceof Response) return result;

  // ... handler logic ...

  return Response.json(data, {
    headers: {
      "x-ratelimit-limit": "100",
      "x-ratelimit-remaining": String(result.remaining),
    },
  });
};

Standard headers:

Header	Meaning
`Retry-After`	Seconds until the client can retry (on 429)
`X-RateLimit-Limit`	Total requests allowed per window
`X-RateLimit-Remaining`	Requests left in the current window
`X-RateLimit-Reset`	Unix timestamp when the window resets

Clients use these to self-throttle instead of blindly retrying.

Different limits for different endpoints

Not every endpoint is equally expensive. A search query hits the database hard. A health check returns a static response. Apply different limits:

resolve: async (c) => {
  const limited = rateLimit(c.request, 10, 60_000); // 10 per minute
  if (limited instanceof Response) return limited;

  const results = await db.search(c.input.query.q);
  return Response.json({ results });
};

resolve: async (c) => {
  const limited = rateLimit(c.request, 1000, 60_000); // 1000 per minute
  if (limited instanceof Response) return limited;

  return Response.json(await db.users.get(c.input.params.id));
};

Pass the limit as a parameter. Each handler decides its own threshold.

Global rate limiting

Apply a baseline limit to all routes using onRequest:

const app = setup({
  onRequest: ({ request }) => {
    const limited = rateLimit(request, 1000, 60_000);
    if (limited instanceof Response) throw limited;

    return { startTime: Date.now() };
  },

  routes: [...],

  onError: ({ error }) => {
    if (error instanceof Response) return error;
    return Response.json({ error: "Internal error" }, { status: 500 });
  },
});

Since onRequest can't return a Response directly, throw it and catch it in onError. Individual handlers can still apply stricter per-route limits on top of the global one.

Redis for multiple instances

The in-memory Map works for a single server. If you run two instances, each has its own counter. A client can send 100 requests to instance A and 100 to instance B, hitting 200 total while each instance thinks the limit is fine.

Redis solves this. One shared counter across all instances:

async function rateLimit(key: string, limit: number, windowMs: number): Promise<true | Response> {
  const current = await redis.incr(`ratelimit:${key}`);

  if (current === 1) {
    await redis.pexpire(`ratelimit:${key}`, windowMs);
  }

  if (current > limit) {
    const ttl = await redis.pttl(`ratelimit:${key}`);
    return Response.json(
      { error: "Too many requests" },
      {
        status: 429,
        headers: { "retry-after": String(Math.ceil(ttl / 1000)) },
      },
    );
  }

  return true;
}

INCR is atomic. Two instances incrementing the same key at the same time both get the correct count. PEXPIRE sets the TTL on first request. Redis handles the expiration.

Same interface as the in-memory version. Same two-line check in the handler.

Rate limiting vs throttling vs backpressure

Three related but different concepts:

Rate limiting rejects excess requests immediately. The client gets 429 and decides what to do. The server does no work for rejected requests.

Throttling slows down excess requests instead of rejecting them. The server queues the request and processes it later. The client waits longer but eventually gets a response.

Backpressure is when a downstream service (database, queue) pushes back because it's overloaded. The server stops accepting new work until the downstream recovers.

Rate limiting is the simplest and most common. Start there. Add throttling only if your clients need guaranteed eventual processing.

When not to rate limit

Internal services. If service A calls service B and you control both, rate limiting adds latency for no benefit. Use backpressure and autoscaling instead.

Health checks. Your monitoring system pings /health frequently. Don't rate limit it or you'll get false alarms.

Webhooks you receive. If a third party sends you webhooks, rate limiting them means you miss events. Process them all and scale your worker instead.

Testing rate limits

describe("rate limiting", () => {
  it("allows requests under the limit", async () => {
    const res = await app.request("/users", {
      headers: { authorization: "Bearer valid-token" },
    });
    expect(res.status).toBe(200);
  });

  it("rejects requests over the limit", async () => {
    // Send requests up to the limit
    for (let i = 0; i < 100; i++) {
      await app.request("/users", {
        headers: {
          authorization: "Bearer valid-token",
          "x-forwarded-for": "test-ip",
        },
      });
    }

    // The next one should be rejected
    const res = await app.request("/users", {
      headers: {
        authorization: "Bearer valid-token",
        "x-forwarded-for": "test-ip",
      },
    });
    expect(res.status).toBe(429);
    expect(res.headers.get("retry-after")).toBeDefined();
  });
});

Reset the rate limiter between tests to avoid test pollution. Either clear the Map in a beforeEach or use a separate rate limiter instance per test.

Summary

Concept	What it means
Rate limiting	Rejecting excess requests to protect the server
429	Too Many Requests. The client should slow down.
`Retry-After`	Header telling the client when to try again
Fixed window	Count per interval. Simple. Has boundary burst problem.
Sliding window	Count in trailing window. No bursts. Uses more memory.
Sliding window counter	Weighted estimate. Best default for most APIs.
Token bucket	Allows bursts up to capacity, then enforces steady rate.
Per-IP	Default. Shared IPs are a weakness.
Per-user	More accurate. Requires authentication first.
Redis	Shared counters across multiple server instances.

Rate limiting is a plain function. It returns true or a 429 Response. Check with instanceof. Two lines in the handler. Same pattern as auth, same pattern as everything else in Hectoday.