Capstone: resilient e-commerce API
What we built
Over the course of these lessons, we built a complete error handling and resilience system for an e-commerce API. Let’s take a step back and look at everything together.
| Layer | What it does | Lesson |
|---|---|---|
| Error response helpers | Consistent error format, status codes baked in | Custom error classes |
| Error classes | Carry status code and error code for service-level errors | Custom error classes |
| Global error handler | Catches unexpected throws, logs structured JSON | A global error handler |
| Operational vs programmer | Return expected errors, catch unexpected ones | Operational vs programmer |
| Retries | Exponential backoff with jitter for transient failures | Retries |
| Timeouts | AbortController, never wait forever | Timeouts |
| Circuit breakers | Stop calling failing services, fail fast | Circuit breakers |
| Fallbacks | Queue, cache, defaults for non-critical failures | Fallbacks and degradation |
| Graceful shutdown | SIGTERM, finish in-flight, close connections | Graceful shutdown |
| Process handlers | uncaughtException, unhandledRejection | Uncaught exceptions |
| Health checks | Dependency health, degraded vs unhealthy | Health checks under failure |
Each layer handles a different kind of failure. Together, they form a system where no error goes unhandled, no failure goes unlogged, and no user gets a worse experience than necessary.
The error handling architecture
Here is how a request flows through the system:
Incoming request
|
|-- Route handler
| |-- Input validation --> return validationFailed() (400)
| |-- Resource lookup --> return notFound() (404)
| |
| |-- External service call
| | |-- Circuit breaker check
| | |-- Timeout wrapper (10s max)
| | |-- Retry with backoff (3 attempts)
| | |-- Fallback on failure (queue, cache, default)
| |
| |-- Response
|
|-- Global error handler (onError callback)
| |-- AppError --> specific status code + error code
| |-- Unknown error --> 500 + generic message + log stack
|
|-- Process handlers
| |-- uncaughtException --> log + shutdown
| |-- unhandledRejection --> log
|
|-- Health check
|-- healthy (200) --> all dependencies up
|-- degraded (200) --> non-critical services down
|-- unhealthy (503) --> critical dependency down Every possible failure has a handler. Validation errors and missing resources are returned directly from the route. Service failures are handled by the resilience stack. Unexpected errors are caught by the global handler. Errors that escape everything are caught by the process handlers.
The complete order flow
This is the heart of the application. Every pattern from the course appears in this single route. Compare this to the naive version from the project setup. The OrderBody schema is the same, but every failure is now handled:
// src/app.ts (updated)
import { setup, route } from "@hectoday/http";
import { z } from "zod/v4";
import db from "./db.js";
import { handleError } from "./error-handler.js";
import { notFound, validationFailed, conflict, serviceUnavailable } from "./errors.js";
import { chargeCard } from "./services/payment.js";
import { sendEmail } from "./services/email.js";
import { reserveStock } from "./services/inventory.js";
import { paymentCircuit } from "./circuits.js";
import { withRetry } from "./retry.js";
import { withTimeout } from "./timeout.js";
import { enqueue } from "./queue.js";
import { checkHealth } from "./health.js";
const OrderBody = z.object({
userId: z.string(),
productId: z.string(),
quantity: z.number().int().positive(),
paymentToken: z.string(),
});
export const app = setup({
onError: ({ error, request }) => handleError(error, request),
routes: [
route.get("/health", {
resolve: () => {
const health = checkHealth();
const statusCode = health.status === "unhealthy" ? 503 : 200;
return Response.json(health, { status: statusCode });
},
}),
route.get("/products", {
resolve: () => {
const products = db.prepare("SELECT * FROM products").all();
return Response.json(products);
},
}),
route.get("/products/:id", {
request: { params: z.object({ id: z.string() }) },
resolve: (c) => {
if (!c.input.ok)
return validationFailed(
c.input.issues.map((i) => ({ field: i.path.join("."), message: i.message })),
);
const product = db.prepare("SELECT * FROM products WHERE id = ?").get(c.input.params.id);
if (!product) return notFound("Product");
return Response.json(product);
},
}),
route.post("/orders", {
request: { body: OrderBody },
resolve: async (c) => {
// Validation: return error directly
if (!c.input.ok) {
return validationFailed(
c.input.issues.map((i) => ({ field: i.path.join("."), message: i.message })),
);
}
const { userId, productId, quantity, paymentToken } = c.input.body;
// Resource lookup: return 404 if missing
const product = db.prepare("SELECT * FROM products WHERE id = ?").get(productId) as any;
if (!product) return notFound("Product");
// Stock check: return 409 if insufficient
if (product.stock < quantity) {
return conflict(`${product.name} has only ${product.stock} in stock`);
}
// CRITICAL: create order in database (no fallback, must succeed)
const orderId = `ord_${crypto.randomUUID().slice(0, 8)}`;
const total = product.price * quantity;
db.prepare("INSERT INTO orders (id, user_id, status, total) VALUES (?, ?, ?, ?)").run(
orderId,
userId,
"pending",
total,
);
db.prepare(
"INSERT INTO order_items (order_id, product_id, quantity, price) VALUES (?, ?, ?, ?)",
).run(orderId, productId, quantity, product.price);
// IMPORTANT: deduct stock
db.prepare("UPDATE products SET stock = stock - ? WHERE id = ?").run(quantity, productId);
// IMPORTANT: charge payment (retry, timeout, circuit breaker, then queue)
try {
await paymentCircuit.call(() =>
withTimeout(
() =>
withRetry(() => chargeCard(total, paymentToken), {
maxRetries: 3,
baseDelayMs: 500,
}),
10_000,
"Payment",
),
);
db.prepare("UPDATE orders SET status = ? WHERE id = ?").run("paid", orderId);
} catch {
enqueue("charge_card", { orderId, amount: total, token: paymentToken });
db.prepare("UPDATE orders SET status = ? WHERE id = ?").run("payment_pending", orderId);
}
// NICE-TO-HAVE: send confirmation email (fire-and-forget)
sendEmail(userId, "Order Confirmed", `Your order ${orderId} has been placed.`).catch(() => {
enqueue("send_email", { to: userId, orderId });
});
const order = db.prepare("SELECT * FROM orders WHERE id = ?").get(orderId);
return Response.json(order, { status: 201 });
},
}),
],
}); Read through this carefully. The validation and lookup errors (validationFailed, notFound, conflict) are returned directly. No throwing, no try-catch for expected cases. The route checks the condition, returns the error response, and that is it.
The payment call is wrapped in three layers. The innermost layer is withRetry, which retries up to 3 times with exponential backoff. That is wrapped in withTimeout, which gives the whole retry sequence 10 seconds before giving up. That is wrapped in paymentCircuit.call, which checks whether the payment service is even available before trying. If all of that fails, the payment is queued for later. The try-catch here is appropriate because the payment service throwing is genuinely unpredictable.
The email is fire-and-forget. If it fails, it is queued and the user still gets their order response.
The user always gets a response. The payment is eventually consistent. The app degrades gracefully when dependencies fail.
Project structure
src/
app.ts # Hectoday HTTP setup, routes, global error handler
server.ts # HTTP server, graceful shutdown
db.ts # Database schema, connection, seed data
errors.ts # Response helpers + AppError classes
error-handler.ts # Global error handler (handleError function)
circuits.ts # Circuit breaker instances (payment, email, inventory)
circuit-breaker.ts # CircuitBreaker class
retry.ts # withRetry function
timeout.ts # withTimeout function
queue.ts # Job queue (enqueue function)
health.ts # Health check with dependency checks
process-handlers.ts # uncaughtException, unhandledRejection
services/
payment.ts # Payment service (simulated, 20% failure)
email.ts # Email service (simulated, 10% failure)
inventory.ts # Inventory service (simulated, 5% failure) The resilience stack
Each layer in the resilience stack protects against a specific failure mode:
Request arrives
|
|-- Circuit breaker (protects against: cascading failures)
| |-- Timeout (protects against: slow responses)
| |-- Retry (protects against: transient failures)
| |-- Fallback (protects against: prolonged outages)
| |-- Error handler (protects against: unhandled errors)
| |-- Process handler (protects against: crashes)
| |-- Docker restart (protects against: process death) Remove any layer and a specific failure mode goes unhandled. No timeouts? Slow dependencies hold resources forever. No circuit breakers? Failed services get hammered with requests. No retries? Transient blips become user-visible errors. No fallbacks? Every failure blocks the user.
Challenges
If you want to go further, here are some challenges that build on everything in this course.
Challenge 1: dead letter queue. When a queued job exceeds its max attempts, move it to a dead letter queue instead of retrying forever. Alert the team so they can investigate.
Challenge 2: rate-aware retries. If the external service returns 429 (rate limited), read the Retry-After header and wait that long before retrying instead of using exponential backoff.
Challenge 3: distributed tracing. Generate a request ID at the start of each request. Pass it through every log entry, error, and external service call. This lets you trace a single request through the entire system.
Challenge 4: monitoring dashboard. Build a monitoring endpoint that shows: error rate over the last hour, circuit breaker states, queue depth, and average response time. Use the patterns from this course to expose operational health.
In the order flow, why is the payment queued on failure instead of returning an error to the user?
What is the most important layer in the resilience stack?