Fallbacks and degradation
Not every failure needs to block the user
The payment service is down. Should the user see “Error: try again later”? Maybe. Or maybe we accept the order and process the payment when the service recovers.
The email service is down. Should the entire order fail because the confirmation email could not be sent? Definitely not. The order succeeded. The email is secondary.
This is the idea behind graceful degradation: the app continues working with reduced functionality instead of failing completely. Non-critical features fail silently or get deferred. Critical features fail with a clear message.
Classifying dependencies
The first step is figuring out which dependencies matter most. Not all dependencies are equal.
Critical means the operation cannot succeed without this dependency. You cannot create an order without the database. You cannot log in without the auth system. If a critical dependency fails, the operation fails. There is no way around it.
Important means the operation is better with this dependency, but can survive without it. Payment processing can be retried later. Inventory reservations can be checked after the fact. If an important dependency fails, you queue the work for later.
Nice-to-have means losing this dependency does not affect the user’s core action. Confirmation emails. Analytics tracking. Activity logging. If a nice-to-have dependency fails, you log it and move on.
Pattern 1: queue for later
When an important operation fails, save it for retry instead of failing the whole request. First, we need a table to hold the jobs. Add this to your src/db.ts file, after the existing table definitions:
db.exec(`
CREATE TABLE IF NOT EXISTS job_queue (
id TEXT PRIMARY KEY,
type TEXT NOT NULL,
payload TEXT NOT NULL,
attempts INTEGER NOT NULL DEFAULT 0,
max_attempts INTEGER NOT NULL DEFAULT 5,
next_attempt_at TEXT NOT NULL DEFAULT (datetime('now')),
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
`); Now create the queue module:
// src/queue.ts
import db from "./db.js";
export function enqueue(type: string, payload: any, maxAttempts: number = 5): void {
db.prepare(
"INSERT INTO job_queue (id, type, payload, attempts, max_attempts, next_attempt_at) VALUES (?, ?, ?, 0, ?, datetime('now'))",
).run(crypto.randomUUID(), type, JSON.stringify(payload), maxAttempts);
} The enqueue function takes a job type (like "charge_card"), a payload with the data needed to perform the job, and a maximum number of attempts. It inserts a row into the job_queue table. A background worker (which you would build separately) picks up these jobs and processes them with exponential backoff.
Here is how it fits into the order flow:
route.post("/orders", {
request: { body: OrderBody },
resolve: async (c) => {
// ... validate input, look up product ...
const orderId = `ord_${crypto.randomUUID().slice(0, 8)}`;
const total = product.price * c.input.body.quantity;
db.prepare("INSERT INTO orders (id, user_id, status, total) VALUES (?, ?, ?, ?)").run(
orderId,
c.input.body.userId,
"pending",
total,
);
try {
await paymentCircuit.call(() => chargeCard(total, c.input.body.paymentToken));
db.prepare("UPDATE orders SET status = ? WHERE id = ?").run("paid", orderId);
} catch {
// Payment failed, queue it for later
enqueue("charge_card", { orderId, amount: total, token: c.input.body.paymentToken });
db.prepare("UPDATE orders SET status = ? WHERE id = ?").run("payment_pending", orderId);
}
const order = db.prepare("SELECT * FROM orders WHERE id = ?").get(orderId);
return Response.json(order, { status: 201 });
},
}); The user gets their order confirmation immediately. The order status is set to "payment_pending" instead of "paid". The payment is processed when the service recovers. The queue processor retries with backoff until the charge goes through.
Pattern 2: fire-and-forget with logging
For nice-to-have operations, try and move on:
// Send confirmation email, do not fail the order if it fails
try {
await emailCircuit.call(() =>
sendEmail(user.email, "Order Confirmed", `Your order ${order.id} is confirmed.`),
);
} catch (err) {
// Log and move on, the order still succeeded
console.log(
JSON.stringify({
level: "warn",
event: "email_failed",
orderId: order.id,
error: err instanceof Error ? err.message : String(err),
}),
);
// Optionally queue for retry
enqueue("send_email", { to: user.email, subject: "Order Confirmed", body: "..." });
} The email failure is logged (so someone can investigate if it becomes a pattern) and optionally queued for retry. But the order response is not affected. The user does not even know the email failed.
Pattern 3: cached fallback
When a read operation fails, you can sometimes return cached or stale data instead of an error:
const productCache = new Map<string, { data: any; cachedAt: number }>();
async function getProductWithFallback(id: string): Promise<any> {
try {
const product = await fetchProductFromService(id);
// Cache the result
productCache.set(id, { data: product, cachedAt: Date.now() });
return product;
} catch {
// Service down, return cached data if available
const cached = productCache.get(id);
if (cached) {
console.log(`Returning cached product ${id} (cached ${Date.now() - cached.cachedAt}ms ago)`);
return cached.data;
}
// No cache, we have to fail
return null;
}
} When the product service is up, we fetch fresh data and cache it. When the service is down, we return the cached version. The data might be a few minutes old, but stale data is better than no data for many use cases.
Is stale data always acceptable? No. A product price that is 5 minutes old is fine for browsing. A bank account balance that is 5 minutes old is not. Use your judgment based on the domain.
Pattern 4: default values
When a non-critical service fails, use a safe default:
async function getShippingEstimate(zip: string): Promise<string> {
try {
const estimate = await shippingService.getEstimate(zip);
return `${estimate.days} business days`;
} catch {
// Shipping estimate service is down, show a safe default
return "5-7 business days";
}
} The default is conservative (a longer estimate) so the user is not disappointed. This is much better than failing the entire checkout because the shipping estimate could not be calculated. The user still gets to place their order.
Combining patterns
A real order flow uses multiple patterns together. Each dependency gets the strategy that matches its importance:
async function processOrder(userId: string, productId: string, quantity: number, token: string) {
const product = db.prepare("SELECT * FROM products WHERE id = ?").get(productId) as any;
const total = product.price * quantity;
const orderId = `ord_${crypto.randomUUID().slice(0, 8)}`;
// CRITICAL: create order in database (no fallback, must succeed)
db.prepare("INSERT INTO orders (id, user_id, status, total) VALUES (?, ?, ?, ?)").run(
orderId,
userId,
"pending",
total,
);
db.prepare(
"INSERT INTO order_items (order_id, product_id, quantity, price) VALUES (?, ?, ?, ?)",
).run(orderId, productId, quantity, product.price);
// IMPORTANT: charge payment (queue if service is down)
try {
await paymentCircuit.call(() => chargeCard(total, token));
db.prepare("UPDATE orders SET status = ? WHERE id = ?").run("paid", orderId);
} catch {
enqueue("charge_card", { orderId, amount: total, token });
db.prepare("UPDATE orders SET status = ? WHERE id = ?").run("payment_pending", orderId);
}
// NICE-TO-HAVE: send email (fire-and-forget)
sendEmail(userId, "Order Confirmed", `Order ${orderId}`).catch(() => {
enqueue("send_email", { userId, orderId });
});
return db.prepare("SELECT * FROM orders WHERE id = ?").get(orderId);
} The database write is critical. If it fails, the whole operation fails because there is no order without a database record. The payment is important but retriable, so we queue it on failure. The email is nice-to-have, so we fire-and-forget with a queue fallback.
This is exactly how production e-commerce systems work. The user always gets a response. The payment is eventually consistent. Non-critical features degrade without affecting the core experience.
Exercises
Exercise 1: Add a job queue table. Implement enqueue. Queue a payment when the charge fails. Verify the job appears in the queue.
Exercise 2: Implement a queue processor that runs every 30 seconds, picks up pending jobs, and retries them.
Exercise 3: Classify the operations in your e-commerce API: which are critical, important, and nice-to-have?
We have covered the resilience patterns: retries, timeouts, circuit breakers, and fallbacks. But all of this assumes the server is running. What happens when the server itself needs to shut down? Next, we will look at the server lifecycle: graceful shutdown, uncaught exceptions, and health checks.
Why should a failed confirmation email not cause the order to fail?