Rate limiting and quotas

Your API needs protection

So far, we’ve been building features. Pagination, filtering, sorting, content negotiation. But we haven’t thought about what happens when a client misbehaves. What if a buggy script fires off thousands of requests per second? What if someone is scraping your entire database? What if a single user is consuming so many resources that everyone else’s requests slow down?

Without rate limiting, one bad actor can bring your API to its knees. Every production API needs limits.

Per-client rate limiting

The basic idea: track how many requests each client has made in a time window, and reject requests that exceed the limit.

First, we need to identify who the client is. The best identifier is an API key. If there’s no key, fall back to IP address.

// src/rate-limit.ts
const counters = new Map<string, { count: number; resetAt: number }>();

export function checkRateLimit(
  clientId: string,
  limit: number,
  windowMs: number,
): { allowed: boolean; remaining: number; resetAt: number } {
  const now = Date.now();
  const key = clientId;
  let entry = counters.get(key);

  if (!entry || now > entry.resetAt) {
    entry = { count: 0, resetAt: now + windowMs };
    counters.set(key, entry);
  }

  entry.count++;

  return {
    allowed: entry.count <= limit,
    remaining: Math.max(0, limit - entry.count),
    resetAt: entry.resetAt,
  };
}

Let’s walk through this. The counters map stores one entry per client. Each entry tracks the request count and when the window resets.

When a request comes in, we look up the client’s entry. If it doesn’t exist, or the window has expired, we create a fresh one. Then we increment the count and check: is this client still under the limit?

The function returns three things. allowed tells us whether to let the request through. remaining tells the client how many requests they have left. resetAt tells them when the window resets.

Rate limit headers

Here’s something that sets good APIs apart from mediocre ones: tell the client about their rate limit on every response, not just when they hit it.

function addRateLimitHeaders(
  res: Response,
  info: { remaining: number; resetAt: number; limit: number },
): Response {
  const headers = new Headers(res.headers);
  headers.set("X-RateLimit-Limit", String(info.limit));
  headers.set("X-RateLimit-Remaining", String(info.remaining));
  headers.set("X-RateLimit-Reset", String(Math.ceil(info.resetAt / 1000)));
  return new Response(res.body, { status: res.status, headers });
}

These three headers appear on every single response:

X-RateLimit-Limit: the maximum number of requests allowed in the window (for example, 100)
X-RateLimit-Remaining: how many requests the client has left (99, 98, 97…)
X-RateLimit-Reset: when the window resets, as a Unix timestamp

A well-behaved client watches X-RateLimit-Remaining decrease and slows down before hitting zero. A badly-behaved client ignores the headers and gets a 429 when it runs out.

Applying rate limits globally

Use the onRequest lifecycle callback to check rate limits before any route handler runs:

const app = setup({
  onRequest: ({ request }) => {
    const apiKey = request.headers.get("x-api-key");
    const clientId = apiKey ?? request.headers.get("x-forwarded-for") ?? "anonymous";

    const rateInfo = checkRateLimit(clientId, 100, 60 * 1000); // 100 per minute

    if (!rateInfo.allowed) {
      throw apiError(429, "RATE_LIMITED", "Too many requests. Try again later.");
    }

    return { rateInfo, clientId };
  },
  onResponse: ({ request, response, locals }) => {
    if (locals?.rateInfo) {
      return addRateLimitHeaders(response, { ...locals.rateInfo, limit: 100 });
    }
    return response;
  },
  routes: [
    /* ... */
  ],
});

The onRequest callback runs before every route handler. It identifies the client, checks the rate limit, and either rejects the request by throwing a 429 response or passes the rate info forward. The onResponse callback then adds the rate limit headers to the response.

Quota tiers

Different clients get different limits. A free tier might allow 100 requests per minute. A paid tier might allow 1,000. An enterprise tier might allow 10,000.

const API_KEYS = new Map<string, string>([
  ["sk-free-abc123", "free"],
  ["sk-pro-def456", "pro"],
  ["sk-ent-ghi789", "enterprise"],
]);

function getClientLimit(apiKey: string | null): number {
  if (!apiKey) return 20; // anonymous
  const tier = API_KEYS.get(apiKey);
  if (!tier) return 20;

  const TIER_LIMITS: Record<string, number> = {
    free: 100,
    pro: 1000,
    enterprise: 10000,
  };

  return TIER_LIMITS[tier] ?? 100;
}

This is exactly how APIs like Stripe and GitHub work. Free users get a generous but limited quota. Paying customers get more. The rate limit headers on every response make the quota transparent.

The 429 response

When a client exceeds their limit, return 429 with a Retry-After header:

if (!rateInfo.allowed) {
  const retryAfter = Math.ceil((rateInfo.resetAt - Date.now()) / 1000);
  return new Response(
    JSON.stringify({
      error: { code: "RATE_LIMITED", message: "Too many requests" },
    }),
    {
      status: 429,
      headers: {
        "content-type": "application/json",
        "retry-after": String(retryAfter),
      },
    },
  );
}

Retry-After tells the client exactly how many seconds to wait before trying again. Good clients respect this. Aggressive clients that ignore it can be throttled further or blocked entirely.

What’s next

We’ve covered the core patterns: CRUD, pagination, filtering, versioning, rate limiting. Now let’s look at some advanced patterns you’ll need in more complex APIs. First up: bulk operations, when a client needs to create, update, or delete many resources at once.

Exercises

Exercise 1: Add global rate limiting with the onRequest callback. Test by sending more requests than the limit.

Exercise 2: Add the X-RateLimit-* headers to every response. Verify they appear with curl -v.

Exercise 3: Implement tier-based limits. Create two API keys with different tiers and verify they get different limits.

Why should rate limit headers appear on every response, not just 429 responses?

Access Required