Handling Disconnects and Reconnection

Connections die silently

A user closes their laptop lid. The WiFi drops. A mobile network switches towers. The WebSocket connection is dead, but the server does not know yet. The socket sits in the connection list, consuming memory, counted as an active viewer.

TCP eventually notices (after minutes), but by then the presence count is wrong and events are being sent to a dead socket.

Ping-pong heartbeat

The WebSocket protocol includes a built-in ping-pong mechanism. The server sends a ping frame; the client responds with a pong frame. If no pong arrives within a timeout, the server knows the connection is dead and closes it.

// src/ws-server.ts — add heartbeat
const HEARTBEAT_INTERVAL = 30_000; // 30 seconds
const PONG_TIMEOUT = 10_000; // Wait 10 seconds for pong

const aliveMap = new Map<WebSocket, boolean>();

function setupHeartbeat(): void {
  setInterval(() => {
    for (const ws of wss.clients) {
      if (aliveMap.get(ws) === false) {
        // No pong received since last ping — connection is dead
        ws.terminate();
        continue;
      }

      aliveMap.set(ws, false); // Will be set back to true when pong arrives
      ws.ping();
    }
  }, HEARTBEAT_INTERVAL);
}

// In the connection handler:
wss.on("connection", (ws) => {
  aliveMap.set(ws, true);

  ws.on("pong", () => {
    aliveMap.set(ws, true);
  });

  ws.on("close", () => {
    aliveMap.delete(ws);
    // existing cleanup...
  });
});

// Call after setting up the server
setupHeartbeat();

Every 30 seconds, the server pings each client. If the client does not pong by the next interval, the server terminates the connection. Dead connections are cleaned up within 30-60 seconds.

[!NOTE] The ws library handles ping/pong at the protocol level. The browser’s WebSocket API does not expose ping/pong directly — it happens automatically. The server initiates pings; the browser responds with pongs without any client code.

Client-side reconnection

Unlike SSE, WebSockets do not reconnect automatically. The client must implement reconnection:

// Client-side reconnection
function connect() {
  const ws = new WebSocket("ws://localhost:3000/ws");

  ws.onopen = () => {
    console.log("Connected");
    // Re-join rooms
    ws.send(JSON.stringify({ type: "join", boardId: "board-1" }));
  };

  ws.onmessage = (event) => {
    const data = JSON.parse(event.data);
    handleEvent(data);
  };

  ws.onclose = (event) => {
    console.log(`Disconnected (code: ${event.code})`);

    if (event.code === 4001) {
      // Unauthorized — do not reconnect
      console.log("Auth failed. Please log in again.");
      return;
    }

    // Reconnect with backoff
    const delay = Math.min(1000 * Math.pow(2, reconnectAttempts), 30000);
    reconnectAttempts++;
    console.log(`Reconnecting in ${delay}ms...`);
    setTimeout(connect, delay);
  };

  ws.onerror = () => {
    // onclose will fire after onerror
  };
}

let reconnectAttempts = 0;
connect();

Key details:

Exponential backoff. Wait 1s, 2s, 4s, 8s, 16s, 30s (max). This prevents hammering the server when it is down.

Re-join rooms. After reconnecting, the client must re-join any rooms it was in. The server does not remember — the old connection’s room memberships were cleaned up on disconnect.

Do not reconnect on 4001. If the server closed the connection due to authentication failure, reconnecting will fail again. Show a login prompt instead.

Reset backoff on success. When onopen fires, reset reconnectAttempts to 0.

State sync after reconnection

When the client reconnects, it might have missed events. Unlike SSE (which has Last-Event-ID), WebSockets have no built-in resume mechanism.

Two approaches:

Full resync. After reconnecting, the client fetches the full state via REST:

ws.onopen = async () => {
  reconnectAttempts = 0;
  ws.send(JSON.stringify({ type: "join", boardId: "board-1" }));

  // Resync state from REST
  const res = await fetch("/boards/board-1/tasks");
  const { data } = await res.json();
  replaceUIState(data);
};

Event replay. The server stores recent events (like the SSE buffer) and replays them when the client sends a “sync” message with the last event it received.

Full resync is simpler and more reliable. Event replay is more efficient but harder to implement correctly. For most apps, full resync is the right choice.

Exercises

Exercise 1: Implement the ping-pong heartbeat. Connect a client, then simulate a dead connection (do not respond to pings). Verify the server terminates the connection.

Exercise 2: Implement client-side reconnection with exponential backoff. Kill the server, observe the backoff delays, restart the server, verify the client reconnects.

Exercise 3: Implement full resync after reconnection. Disconnect, create a task (via REST), reconnect, and verify the UI picks up the new task via the resync.