Handling Disconnects and Reconnection
Connections die silently
A user closes their laptop lid. The WiFi drops. A mobile network switches towers. The WebSocket connection is dead, but the server does not know yet. The socket sits in the connection list, consuming memory, counted as an active viewer.
TCP eventually notices (after minutes), but by then the presence count is wrong and events are being sent to a dead socket.
Ping-pong heartbeat
The WebSocket protocol includes a built-in ping-pong mechanism. The server sends a ping frame; the client responds with a pong frame. If no pong arrives within a timeout, the server knows the connection is dead and closes it.
// src/ws-server.ts — add heartbeat
const HEARTBEAT_INTERVAL = 30_000; // 30 seconds
const PONG_TIMEOUT = 10_000; // Wait 10 seconds for pong
const aliveMap = new Map<WebSocket, boolean>();
function setupHeartbeat(): void {
setInterval(() => {
for (const ws of wss.clients) {
if (aliveMap.get(ws) === false) {
// No pong received since last ping — connection is dead
ws.terminate();
continue;
}
aliveMap.set(ws, false); // Will be set back to true when pong arrives
ws.ping();
}
}, HEARTBEAT_INTERVAL);
}
// In the connection handler:
wss.on("connection", (ws) => {
aliveMap.set(ws, true);
ws.on("pong", () => {
aliveMap.set(ws, true);
});
ws.on("close", () => {
aliveMap.delete(ws);
// existing cleanup...
});
});
// Call after setting up the server
setupHeartbeat(); Every 30 seconds, the server pings each client. If the client does not pong by the next interval, the server terminates the connection. Dead connections are cleaned up within 30-60 seconds.
[!NOTE] The
wslibrary handles ping/pong at the protocol level. The browser’s WebSocket API does not expose ping/pong directly — it happens automatically. The server initiates pings; the browser responds with pongs without any client code.
Client-side reconnection
Unlike SSE, WebSockets do not reconnect automatically. The client must implement reconnection:
// Client-side reconnection
function connect() {
const ws = new WebSocket("ws://localhost:3000/ws");
ws.onopen = () => {
console.log("Connected");
// Re-join rooms
ws.send(JSON.stringify({ type: "join", boardId: "board-1" }));
};
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
handleEvent(data);
};
ws.onclose = (event) => {
console.log(`Disconnected (code: ${event.code})`);
if (event.code === 4001) {
// Unauthorized — do not reconnect
console.log("Auth failed. Please log in again.");
return;
}
// Reconnect with backoff
const delay = Math.min(1000 * Math.pow(2, reconnectAttempts), 30000);
reconnectAttempts++;
console.log(`Reconnecting in ${delay}ms...`);
setTimeout(connect, delay);
};
ws.onerror = () => {
// onclose will fire after onerror
};
}
let reconnectAttempts = 0;
connect(); Key details:
Exponential backoff. Wait 1s, 2s, 4s, 8s, 16s, 30s (max). This prevents hammering the server when it is down.
Re-join rooms. After reconnecting, the client must re-join any rooms it was in. The server does not remember — the old connection’s room memberships were cleaned up on disconnect.
Do not reconnect on 4001. If the server closed the connection due to authentication failure, reconnecting will fail again. Show a login prompt instead.
Reset backoff on success. When onopen fires, reset reconnectAttempts to 0.
State sync after reconnection
When the client reconnects, it might have missed events. Unlike SSE (which has Last-Event-ID), WebSockets have no built-in resume mechanism.
Two approaches:
Full resync. After reconnecting, the client fetches the full state via REST:
ws.onopen = async () => {
reconnectAttempts = 0;
ws.send(JSON.stringify({ type: "join", boardId: "board-1" }));
// Resync state from REST
const res = await fetch("/boards/board-1/tasks");
const { data } = await res.json();
replaceUIState(data);
}; Event replay. The server stores recent events (like the SSE buffer) and replays them when the client sends a “sync” message with the last event it received.
Full resync is simpler and more reliable. Event replay is more efficient but harder to implement correctly. For most apps, full resync is the right choice.
Exercises
Exercise 1: Implement the ping-pong heartbeat. Connect a client, then simulate a dead connection (do not respond to pings). Verify the server terminates the connection.
Exercise 2: Implement client-side reconnection with exponential backoff. Kill the server, observe the backoff delays, restart the server, verify the client reconnects.
Exercise 3: Implement full resync after reconnection. Disconnect, create a task (via REST), reconnect, and verify the UI picks up the new task via the resync.
Why does the client use exponential backoff for reconnection instead of a fixed delay?