Health and Metrics

From logs to metrics

Individual log entries tell you what happened to one request. Metrics tell you what is happening to all requests: request rate, error rate, average response time, queue depth. Metrics are aggregated numbers derived from log data.

Types of metrics

Counters — Values that only go up: total requests, total errors, total orders placed.

Gauges — Values that go up and down: active connections, queue depth, cache size.

Histograms — Distribution of values: response time percentiles (p50, p95, p99), request body sizes.

A simple metrics collector

// src/metrics.ts
class Metrics {
  private counters: Map<string, number> = new Map();
  private gauges: Map<string, number> = new Map();
  private histograms: Map<string, number[]> = new Map();

  increment(name: string, value: number = 1): void {
    this.counters.set(name, (this.counters.get(name) ?? 0) + value);
  }

  gauge(name: string, value: number): void {
    this.gauges.set(name, value);
  }

  observe(name: string, value: number): void {
    const values = this.histograms.get(name) ?? [];
    values.push(value);
    this.histograms.set(name, values);
  }

  getSnapshot(): Record<string, unknown> {
    const snapshot: Record<string, unknown> = {};

    for (const [name, value] of this.counters) {
      snapshot[`counter.${name}`] = value;
    }
    for (const [name, value] of this.gauges) {
      snapshot[`gauge.${name}`] = value;
    }
    for (const [name, values] of this.histograms) {
      const sorted = [...values].sort((a, b) => a - b);
      snapshot[`histogram.${name}`] = {
        count: sorted.length,
        min: sorted[0],
        max: sorted[sorted.length - 1],
        avg: Math.round(sorted.reduce((a, b) => a + b, 0) / sorted.length),
        p95: sorted[Math.floor(sorted.length * 0.95)],
        p99: sorted[Math.floor(sorted.length * 0.99)],
      };
    }

    return snapshot;
  }

  reset(): void {
    this.histograms.clear();
    // Counters and gauges persist between resets
  }
}

export const metrics = new Metrics();

Collecting metrics from requests

onResponse: ({ response, locals }) => {
  const duration = Date.now() - (locals.startTime as number);

  // Count requests
  metrics.increment("http.requests.total");
  metrics.increment(`http.requests.${response.status}`);

  // Track response time
  metrics.observe("http.response_time_ms", duration);

  // Track errors
  if (response.status >= 500) {
    metrics.increment("http.errors.5xx");
  }

  // ... normal request logging
},

Every request increments the counter and records the response time. After 1,000 requests, the metrics snapshot shows: total requests, error rate, and response time percentiles.

A health/metrics endpoint

route.get("/health", {
  resolve: () => {
    const snapshot = metrics.getSnapshot();
    const dbOk = checkDatabase();
    const uptime = process.uptime();

    return Response.json({
      status: dbOk ? "healthy" : "degraded",
      uptime: Math.round(uptime),
      metrics: snapshot,
    });
  },
});

[!NOTE] The Deploying with Docker course added a /health endpoint for container health checks. Now it returns metrics too — the operations team sees request rate, error rate, and response times at a glance.

Periodic metric logging

Log metrics at a fixed interval for trend analysis:

setInterval(() => {
  const snapshot = metrics.getSnapshot();
  logger.info("metrics snapshot", snapshot);
  metrics.reset(); // Reset histograms for the next interval
}, 60_000); // Every 60 seconds

{
  "level": "info",
  "message": "metrics snapshot",
  "counter.http.requests.total": 1234,
  "counter.http.errors.5xx": 3,
  "histogram.http.response_time_ms": {
    "count": 1234,
    "min": 2,
    "max": 450,
    "avg": 18,
    "p95": 45,
    "p99": 120
  }
}

One log line per minute with the full picture. Search for message: "metrics snapshot" to plot trends over time.

Exercises

Exercise 1: Build the Metrics class. Collect request counts and response times in onResponse.

Exercise 2: Add a /health endpoint that returns the metrics snapshot. Make 50 requests. Check the snapshot.

Exercise 3: Add periodic metric logging (every 30 seconds). Watch the snapshots accumulate in the logs.

Why collect metrics in addition to request logs?

Access Required