Capstone: resilient e-commerce API

What we built

Over the course of these lessons, we built a complete error handling and resilience system for an e-commerce API. Let’s take a step back and look at everything together.

Layer	What it does	Lesson
Error response helpers	Consistent error format, status codes baked in	Custom error classes
Error classes	Carry status code and error code for service-level errors	Custom error classes
Global error handler	Catches unexpected throws, logs structured JSON	A global error handler
Operational vs programmer	Return expected errors, catch unexpected ones	Operational vs programmer
Retries	Exponential backoff with jitter for transient failures	Retries
Timeouts	AbortController, never wait forever	Timeouts
Circuit breakers	Stop calling failing services, fail fast	Circuit breakers
Fallbacks	Queue, cache, defaults for non-critical failures	Fallbacks and degradation
Graceful shutdown	SIGTERM, finish in-flight, close connections	Graceful shutdown
Process handlers	uncaughtException, unhandledRejection	Uncaught exceptions
Health checks	Dependency health, degraded vs unhealthy	Health checks under failure

Each layer handles a different kind of failure. Together, they form a system where no error goes unhandled, no failure goes unlogged, and no user gets a worse experience than necessary.

The error handling architecture

Here is how a request flows through the system:

Incoming request
|
|-- Route handler
|   |-- Input validation --> return validationFailed() (400)
|   |-- Resource lookup --> return notFound() (404)
|   |
|   |-- External service call
|   |   |-- Circuit breaker check
|   |   |-- Timeout wrapper (10s max)
|   |   |-- Retry with backoff (3 attempts)
|   |   |-- Fallback on failure (queue, cache, default)
|   |
|   |-- Response
|
|-- Global error handler (onError callback)
|   |-- AppError --> specific status code + error code
|   |-- Unknown error --> 500 + generic message + log stack
|
|-- Process handlers
|   |-- uncaughtException --> log + shutdown
|   |-- unhandledRejection --> log
|
|-- Health check
    |-- healthy (200) --> all dependencies up
    |-- degraded (200) --> non-critical services down
    |-- unhealthy (503) --> critical dependency down

Every possible failure has a handler. Validation errors and missing resources are returned directly from the route. Service failures are handled by the resilience stack. Unexpected errors are caught by the global handler. Errors that escape everything are caught by the process handlers.

The complete order flow

This is the heart of the application. Every pattern from the course appears in this single route. Compare this to the naive version from the project setup. The OrderBody schema is the same, but every failure is now handled:

Code along

// src/app.ts (updated)
import { setup, route } from "@hectoday/http";
import { z } from "zod/v4";
import db from "./db.js";
import { handleError } from "./error-handler.js";
import { notFound, validationFailed, conflict, serviceUnavailable } from "./errors.js";
import { chargeCard } from "./services/payment.js";
import { sendEmail } from "./services/email.js";
import { reserveStock } from "./services/inventory.js";
import { paymentCircuit } from "./circuits.js";
import { withRetry } from "./retry.js";
import { withTimeout } from "./timeout.js";
import { enqueue } from "./queue.js";
import { checkHealth } from "./health.js";

const OrderBody = z.object({
  userId: z.string(),
  productId: z.string(),
  quantity: z.number().int().positive(),
  paymentToken: z.string(),
});

export const app = setup({
  onError: ({ error, request }) => handleError(error, request),
  routes: [
    route.get("/health", {
      resolve: () => {
        const health = checkHealth();
        const statusCode = health.status === "unhealthy" ? 503 : 200;
        return Response.json(health, { status: statusCode });
      },
    }),

    route.get("/products", {
      resolve: () => {
        const products = db.prepare("SELECT * FROM products").all();
        return Response.json(products);
      },
    }),

    route.get("/products/:id", {
      request: { params: z.object({ id: z.string() }) },
      resolve: (c) => {
        if (!c.input.ok)
          return validationFailed(
            c.input.issues.map((i) => ({ field: i.path.join("."), message: i.message })),
          );

        const product = db.prepare("SELECT * FROM products WHERE id = ?").get(c.input.params.id);
        if (!product) return notFound("Product");

        return Response.json(product);
      },
    }),

    route.post("/orders", {
      request: { body: OrderBody },
      resolve: async (c) => {
        // Validation: return error directly
        if (!c.input.ok) {
          return validationFailed(
            c.input.issues.map((i) => ({ field: i.path.join("."), message: i.message })),
          );
        }

        const { userId, productId, quantity, paymentToken } = c.input.body;

        // Resource lookup: return 404 if missing
        const product = db.prepare("SELECT * FROM products WHERE id = ?").get(productId) as any;
        if (!product) return notFound("Product");

        // Stock check: return 409 if insufficient
        if (product.stock < quantity) {
          return conflict(`${product.name} has only ${product.stock} in stock`);
        }

        // CRITICAL: create order in database (no fallback, must succeed)
        const orderId = `ord_${crypto.randomUUID().slice(0, 8)}`;
        const total = product.price * quantity;

        db.prepare("INSERT INTO orders (id, user_id, status, total) VALUES (?, ?, ?, ?)").run(
          orderId,
          userId,
          "pending",
          total,
        );
        db.prepare(
          "INSERT INTO order_items (order_id, product_id, quantity, price) VALUES (?, ?, ?, ?)",
        ).run(orderId, productId, quantity, product.price);

        // IMPORTANT: deduct stock
        db.prepare("UPDATE products SET stock = stock - ? WHERE id = ?").run(quantity, productId);

        // IMPORTANT: charge payment (retry, timeout, circuit breaker, then queue)
        try {
          await paymentCircuit.call(() =>
            withTimeout(
              () =>
                withRetry(() => chargeCard(total, paymentToken), {
                  maxRetries: 3,
                  baseDelayMs: 500,
                }),
              10_000,
              "Payment",
            ),
          );
          db.prepare("UPDATE orders SET status = ? WHERE id = ?").run("paid", orderId);
        } catch {
          enqueue("charge_card", { orderId, amount: total, token: paymentToken });
          db.prepare("UPDATE orders SET status = ? WHERE id = ?").run("payment_pending", orderId);
        }

        // NICE-TO-HAVE: send confirmation email (fire-and-forget)
        sendEmail(userId, "Order Confirmed", `Your order ${orderId} has been placed.`).catch(() => {
          enqueue("send_email", { to: userId, orderId });
        });

        const order = db.prepare("SELECT * FROM orders WHERE id = ?").get(orderId);
        return Response.json(order, { status: 201 });
      },
    }),
  ],
});

Read through this carefully. The validation and lookup errors (validationFailed, notFound, conflict) are returned directly. No throwing, no try-catch for expected cases. The route checks the condition, returns the error response, and that is it.

The payment call is wrapped in three layers. The innermost layer is withRetry, which retries up to 3 times with exponential backoff. That is wrapped in withTimeout, which gives the whole retry sequence 10 seconds before giving up. That is wrapped in paymentCircuit.call, which checks whether the payment service is even available before trying. If all of that fails, the payment is queued for later. The try-catch here is appropriate because the payment service throwing is genuinely unpredictable.

The email is fire-and-forget. If it fails, it is queued and the user still gets their order response.

The user always gets a response. The payment is eventually consistent. The app degrades gracefully when dependencies fail.

Project structure

src/
  app.ts                    # Hectoday HTTP setup, routes, global error handler
  server.ts                 # HTTP server, graceful shutdown
  db.ts                     # Database schema, connection, seed data
  errors.ts                 # Response helpers + AppError classes
  error-handler.ts          # Global error handler (handleError function)
  circuits.ts               # Circuit breaker instances (payment, email, inventory)
  circuit-breaker.ts        # CircuitBreaker class
  retry.ts                  # withRetry function
  timeout.ts                # withTimeout function
  queue.ts                  # Job queue (enqueue function)
  health.ts                 # Health check with dependency checks
  process-handlers.ts       # uncaughtException, unhandledRejection
  services/
    payment.ts              # Payment service (simulated, 20% failure)
    email.ts                # Email service (simulated, 10% failure)
    inventory.ts            # Inventory service (simulated, 5% failure)

The resilience stack

Each layer in the resilience stack protects against a specific failure mode:

Request arrives
|
|-- Circuit breaker (protects against: cascading failures)
|   |-- Timeout (protects against: slow responses)
|       |-- Retry (protects against: transient failures)
|           |-- Fallback (protects against: prolonged outages)
|               |-- Error handler (protects against: unhandled errors)
|                   |-- Process handler (protects against: crashes)
|                       |-- Docker restart (protects against: process death)

Remove any layer and a specific failure mode goes unhandled. No timeouts? Slow dependencies hold resources forever. No circuit breakers? Failed services get hammered with requests. No retries? Transient blips become user-visible errors. No fallbacks? Every failure blocks the user.

Challenges

If you want to go further, here are some challenges that build on everything in this course.

Challenge 1: dead letter queue. When a queued job exceeds its max attempts, move it to a dead letter queue instead of retrying forever. Alert the team so they can investigate.

Challenge 2: rate-aware retries. If the external service returns 429 (rate limited), read the Retry-After header and wait that long before retrying instead of using exponential backoff.

Challenge 3: distributed tracing. Generate a request ID at the start of each request. Pass it through every log entry, error, and external service call. This lets you trace a single request through the entire system.

Challenge 4: monitoring dashboard. Build a monitoring endpoint that shows: error rate over the last hour, circuit breaker states, queue depth, and average response time. Use the patterns from this course to expose operational health.

In the order flow, why is the payment queued on failure instead of returning an error to the user?

What is the most important layer in the resilience stack?