Health checks under failure

A health check that lies

Back in the project setup, we added a simple health endpoint:

route.get("/health", { resolve: () => Response.json({ status: "ok" }) });

This returns {"status":"ok"} no matter what. The database could be down. The payment service could be unreachable. The disk could be full. And our health check still says everything is fine.

A load balancer checks this endpoint to decide whether to send traffic to this instance. If it always returns 200, the load balancer keeps sending requests to a broken instance. Users see errors, and the load balancer has no idea anything is wrong.

A health check should reflect the actual health of the application and its dependencies.

Checking dependencies

Let’s build a real health check that tests each dependency:

The circuit breaker instances we created in the circuit breakers lesson need to be in a shared module so both the routes and the health check can access them. Create src/circuits.ts:

Code along

// src/circuits.ts
import { CircuitBreaker } from "./circuit-breaker.js";

export const paymentCircuit = new CircuitBreaker("payment-service", { threshold: 5 });
export const emailCircuit = new CircuitBreaker("email-service", { threshold: 3 });
export const inventoryCircuit = new CircuitBreaker("inventory-service", { threshold: 10 });

Now create the health check:

Code along

// src/health.ts
import fs from "node:fs";
import db from "./db.js";
import { paymentCircuit, emailCircuit } from "./circuits.js";

interface HealthStatus {
  status: "healthy" | "degraded" | "unhealthy";
  checks: Record<string, { status: string; message?: string }>;
}

export function checkHealth(): HealthStatus {
  const checks: HealthStatus["checks"] = {};

  // Database
  try {
    db.prepare("SELECT 1").get();
    checks.database = { status: "up" };
  } catch (err) {
    checks.database = { status: "down", message: err instanceof Error ? err.message : "Unknown" };
  }

  // Disk space (simplified)
  try {
    const stats = fs.statfsSync("/");
    const freePercent = (stats.bfree / stats.blocks) * 100;
    checks.disk =
      freePercent > 10
        ? { status: "up" }
        : { status: "warning", message: `${freePercent.toFixed(1)}% free` };
  } catch {
    checks.disk = { status: "unknown" };
  }

  // External services (from circuit breakers)
  checks.payment =
    paymentCircuit.getState().state === "open"
      ? { status: "down", message: "Circuit open" }
      : { status: "up" };

  checks.email =
    emailCircuit.getState().state === "open"
      ? { status: "down", message: "Circuit open" }
      : { status: "up" };

  // Determine overall status
  const hasDown = Object.values(checks).some((c) => c.status === "down");
  const hasWarning = Object.values(checks).some((c) => c.status === "warning");

  let status: HealthStatus["status"];
  if (checks.database.status === "down") {
    status = "unhealthy"; // Database is critical, app cannot function
  } else if (hasDown || hasWarning) {
    status = "degraded"; // Some services down, but app still works
  } else {
    status = "healthy";
  }

  return { status, checks };
}

Let’s walk through what this does. The function checks each dependency one by one.

For the database, it runs SELECT 1, the simplest possible query. If it works, the database is up. If it throws, the database is down and the error message is included.

For disk space, it checks how much free space is left. If less than 10% is free, that is a warning. This matters because SQLite writes to disk, and a full disk means writes will fail.

For external services, we use the circuit breakers from earlier. If a circuit is open, that service is down. We do not need to make a new call to check. The circuit breaker already knows.

Then it determines the overall status. If the database is down, the app is unhealthy because nothing works without the database. If some other service is down (like email), the app is degraded because the core functionality still works but some features are impaired. If everything is up, the app is healthy.

The health endpoint

Update the /health route in your src/app.ts to use the new checkHealth function:

Code along

route.get("/health", {
  resolve: () => {
    const health = checkHealth();
    const statusCode = health.status === "unhealthy" ? 503 : 200;
    return Response.json(health, { status: statusCode });
  },
});

The status code matters. 200 for healthy and degraded means the app can still serve requests. The load balancer keeps sending traffic. 503 for unhealthy means the app cannot function. The load balancer should stop sending traffic to this instance.

Why does degraded return 200 and not 503? Because degraded means the app still works for most requests. The email service is down, but orders can still be placed with queued notifications. Returning 503 would cause the load balancer to remove this instance from the pool entirely, which is worse than degraded service.

Three health states

Healthy means all dependencies are up and everything works normally:

{
  "status": "healthy",
  "checks": {
    "database": { "status": "up" },
    "payment": { "status": "up" },
    "email": { "status": "up" }
  }
}

Degraded means some dependencies are down, but the app’s core functionality works. Emails are not sending, but orders can still be placed:

{
  "status": "degraded",
  "checks": {
    "database": { "status": "up" },
    "payment": { "status": "up" },
    "email": { "status": "down", "message": "Circuit open" }
  }
}

Unhealthy means a critical dependency is down and the app cannot serve requests correctly:

{
  "status": "unhealthy",
  "checks": { "database": { "status": "down", "message": "SQLITE_CANTOPEN" } }
}

Liveness vs readiness

Kubernetes and some load balancers distinguish between two types of health checks.

Liveness answers the question “is the process alive?” If not, restart it. This checks whether the app is running and responding, regardless of dependency health. A process can be alive but unable to handle requests.

Readiness answers the question “can the process handle requests?” If not, stop sending traffic to it. This checks dependency health. An app might be alive but not ready because the database is migrating or a critical service is initializing.

// Liveness: is the process responding?
route.get("/health/live", {
  resolve: () => Response.json({ status: "ok" }),
});

// Readiness: can the process handle requests?
route.get("/health/ready", {
  resolve: () => {
    const health = checkHealth();
    const statusCode = health.status === "unhealthy" ? 503 : 200;
    return Response.json(health, { status: statusCode });
  },
});

The liveness endpoint is simple. If the process can respond to an HTTP request, it is alive. No dependency checks needed.

The readiness endpoint is the full health check. If the database is down, the process is alive but not ready. The orchestrator stops sending traffic but does not restart the process (because the process itself is fine, it is the database that has the problem).

[!NOTE] The Deploying with Docker course’s health check lesson used a simple /health endpoint. This lesson extends it to check dependencies and report degraded states, connecting error handling to deployment infrastructure.

Exercises

Exercise 1: Update your /health endpoint to check database connectivity. Rename the database file to simulate a failure. Verify the health check returns 503.

Exercise 2: Add circuit breaker states to the health check. Open a circuit (make the payment service fail). Verify the health check shows “degraded.”

Exercise 3: Implement separate /health/live and /health/ready endpoints. Verify liveness always returns 200, readiness returns 503 when the database is down.

We have covered the full server lifecycle: graceful shutdown, process-level error handlers, and health checks that reflect real dependency state. In the final section, we will pull everything together with a checklist and capstone project.

Why does the health check return 200 for 'degraded' instead of 503?

Access Required

Health checks under failure

A health check that lies

Checking dependencies

The health endpoint

Three health states

Liveness vs readiness

Exercises