WebSockets at Scale: Handling Millions of Connections

Introduction

Building a WebSocket server that handles a few hundred connections is straightforward. Building one that handles millions is an entirely different engineering challenge. Unlike stateless HTTP request-response cycles, WebSocket connections are persistent, stateful, and long-lived. Each connection consumes memory, file descriptors, and CPU cycles for as long as the client stays connected—sometimes hours or days. A single Node.js process can comfortably manage 10,000 to 50,000 concurrent WebSocket connections, but reaching millions requires a fundamentally different approach: horizontal scaling with pub/sub message brokers, intelligent load balancing with sticky sessions, robust connection lifecycle management, and comprehensive monitoring.

This guide covers every aspect of scaling WebSocket servers from thousands to millions of connections. We'll start with understanding the resource constraints of a single server, then build up through connection management patterns, pub/sub architectures for multi-server deployments, load balancing strategies, graceful shutdown procedures, and production monitoring. Each section includes working code examples you can adapt to your specific use case.

Understanding Single-Server Capacity

Before scaling horizontally, you need to understand the limits of a single server. Every WebSocket connection consumes several system resources: memory for the connection state and buffers, file descriptors from the operating system, and CPU time for message processing and serialization.

Memory Per Connection

Each WebSocket connection typically consumes between 2KB and 10KB of memory, depending on the library, message size, and application state stored per connection. The ws library for Node.js is one of the most memory-efficient implementations available.

// Estimating memory requirements for a WebSocket server
const CONNECTIONS = 50000;
const MEMORY_PER_CONNECTION = 5 * 1024; // 5KB average
const BUFFER_OVERHEAD = 2 * 1024; // 2KB for send/receive buffers
const TOTAL_MEMORY = CONNECTIONS * (MEMORY_PER_CONNECTION + BUFFER_OVERHEAD);
const TOTAL_MB = Math.ceil(TOTAL_MEMORY / (1024 * 1024));
 
console.log(`Estimated memory for ${CONNECTIONS} connections: ${TOTAL_MB}MB`);
// Output: Estimated memory for 50000 connections: 342MB
 
// Add overhead for the Node.js runtime itself (~100MB)
// Plus your application logic
// Recommended: at least 2GB RAM for 50K connections

File Descriptor Limits

Operating systems impose limits on the number of file descriptors a process can open. Each WebSocket connection requires one file descriptor. On Linux, check and increase the limit:

# Check current limit
ulimit -n
 
# Increase for current session (typically defaults to 1024)
ulimit -n 1000000
 
# Permanent change: edit /etc/security/limits.conf
# Add these lines:
# * soft nofile 1000000
# * hard nofile 1000000
 
# Also check and increase the system-wide limit
cat /proc/sys/fs/file-max
echo 2000000 > /proc/sys/fs/file-max

CPU Bottlenecks

CPU becomes a bottleneck when you're processing messages at high frequency. If each message takes 0.1ms to process, a single core can handle 10,000 messages per second. For a chat application with 100,000 connections sending an average of 1 message per second, you need at least 10 cores dedicated to message processing.

// Benchmarking message processing throughput
function benchmarkMessageProcessing(connections, messagesPerSecond) {
  const processingTimePerMessage = 0.1; // ms, measured via profiling
  const totalMessagesPerSecond = connections * messagesPerSecond;
  const requiredCpuMs = totalMessagesPerSecond * processingTimePerMessage;
  const availableCpuMs = 1000; // 1 second per core
 
  const coresNeeded = Math.ceil(requiredCpuMs / availableCpuMs);
 
  console.log(`Connections: ${connections}`);
  console.log(`Messages/sec: ${totalMessagesPerSecond}`);
  console.log(`CPU cores needed: ${coresNeeded}`);
}
 
benchmarkMessageProcessing(100000, 1); // 10 cores
benchmarkMessageProcessing(500000, 0.5); // 25 cores
benchmarkMessageProcessing(1000000, 0.1); // 10 cores (low frequency)

Bottleneck Analysis Matrix

Resource	Symptom	Threshold	Mitigation
Memory	OOM crashes, slow GC	~50K connections per 2GB	Horizontal scaling, connection limits
File descriptors	EMFILE errors	OS default 1024	Increase ulimit, tune kernel
CPU	High latency, message drops	80% sustained utilization	Worker threads, horizontal scaling
Network	Packet loss, timeouts	NIC bandwidth saturation	Binary protocols, compression
Event loop	Delayed timers, stale connections	>100ms loop lag	Offload CPU work to worker threads

Connection Management Patterns

Effective connection management is the foundation of a scalable WebSocket server. You need to track connections, associate them with users, handle multiple devices per user, and clean up resources when connections drop.

Connection Registry with Multi-Device Support

In production, a single user may have multiple connected devices—phone, laptop, tablet. The connection registry must map users to sets of connections, not just individual sockets.

class ConnectionRegistry {
  constructor() {
    // userId -> Map<connectionId, WebSocket>
    this.userConnections = new Map();
    // WebSocket -> connection metadata
    this.metadata = new WeakMap();
    // connectionId -> WebSocket (for quick lookup)
    this.connectionIndex = new Map();
    this.nextConnectionId = 0;
  }
 
  add(userId, ws, metadata = {}) {
    const connectionId = `conn_${++this.nextConnectionId}`;
 
    if (!this.userConnections.has(userId)) {
      this.userConnections.set(userId, new Map());
    }
 
    this.userConnections.get(userId).set(connectionId, ws);
    this.connectionIndex.set(connectionId, ws);
    this.metadata.set(ws, {
      userId,
      connectionId,
      connectedAt: Date.now(),
      lastActivity: Date.now(),
      ip: metadata.ip || 'unknown',
      userAgent: metadata.userAgent || 'unknown',
      ...metadata,
    });
 
    return connectionId;
  }
 
  remove(ws) {
    const meta = this.metadata.get(ws);
    if (!meta) return null;
 
    const userConns = this.userConnections.get(meta.userId);
    if (userConns) {
      userConns.delete(meta.connectionId);
      if (userConns.size === 0) {
        this.userConnections.delete(meta.userId);
      }
    }
 
    this.connectionIndex.delete(meta.connectionId);
    return meta;
  }
 
  getByUser(userId) {
    return this.userConnections.get(userId) || new Map();
  }
 
  getByConnectionId(connectionId) {
    return this.connectionIndex.get(connectionId) || null;
  }
 
  sendToUser(userId, message) {
    const conns = this.userConnections.get(userId);
    if (!conns) return 0;
 
    const payload = JSON.stringify(message);
    let sent = 0;
 
    for (const [connId, ws] of conns) {
      if (ws.readyState === 1) {
        ws.send(payload);
        sent++;
      }
    }
 
    return sent;
  }
 
  get stats() {
    let totalConnections = 0;
    for (const conns of this.userConnections.values()) {
      totalConnections += conns.size;
    }
    return {
      users: this.userConnections.size,
      connections: totalConnections,
    };
  }
}

Heartbeat System for Dead Connection Detection

Zombie connections—connections where the client has disappeared without closing the TCP socket—are a major resource leak. Without heartbeats, these connections accumulate indefinitely, eventually exhausting server resources. The WebSocket protocol defines ping/pong frames specifically for this purpose.

class HeartbeatManager {
  constructor(options = {}) {
    this.interval = options.interval || 30000; // 30 seconds
    this.timeout = options.timeout || 10000; // 10 seconds to respond
    this.clients = new Map(); // ws -> { lastPong, isAlive }
    this.timer = null;
    this.onTerminate = options.onTerminate || (() => {});
  }
 
  add(ws) {
    this.clients.set(ws, {
      lastPong: Date.now(),
      isAlive: true,
      pendingPing: false,
    });
 
    ws.on('pong', () => {
      const client = this.clients.get(ws);
      if (client) {
        client.lastPong = Date.now();
        client.isAlive = true;
        client.pendingPing = false;
      }
    });
 
    ws.on('close', () => {
      this.clients.delete(ws);
    });
  }
 
  start() {
    this.timer = setInterval(() => {
      const now = Date.now();
 
      for (const [ws, client] of this.clients) {
        if (client.pendingPing && (now - client.lastPong) > this.timeout) {
          // Client didn't respond to ping within timeout
          ws.terminate();
          this.clients.delete(ws);
          this.onTerminate(ws, 'heartbeat_timeout');
          continue;
        }
 
        if (ws.readyState === 1) {
          client.pendingPing = true;
          ws.ping();
        }
      }
    }, this.interval);
  }
 
  stop() {
    if (this.timer) {
      clearInterval(this.timer);
      this.timer = null;
    }
  }
 
  get stats() {
    let alive = 0;
    let pending = 0;
    for (const client of this.clients.values()) {
      if (client.pendingPing) pending++;
      else alive++;
    }
    return { total: this.clients.size, alive, pendingPing: pending };
  }
}

Rate Limiting Per Connection

Prevent individual connections from monopolizing server resources with per-connection rate limiting.

class ConnectionRateLimiter {
  constructor(options = {}) {
    this.maxMessagesPerSecond = options.maxMessagesPerSecond || 50;
    this.maxBytesPerSecond = options.maxBytesPerSecond || 1024 * 1024; // 1MB
    this.windows = new WeakMap(); // ws -> { messageCount, byteCount, windowStart }
  }
 
  check(ws, messageSize = 0) {
    let window = this.windows.get(ws);
 
    if (!window) {
      window = { messageCount: 0, byteCount: 0, windowStart: Date.now() };
      this.windows.set(ws, window);
    }
 
    const now = Date.now();
    if (now - window.windowStart > 1000) {
      window.messageCount = 0;
      window.byteCount = 0;
      window.windowStart = now;
    }
 
    window.messageCount++;
    window.byteCount += messageSize;
 
    if (window.messageCount > this.maxMessagesPerSecond) {
      return { allowed: false, reason: 'message_rate_exceeded' };
    }
 
    if (window.byteCount > this.maxBytesPerSecond) {
      return { allowed: false, reason: 'byte_rate_exceeded' };
    }
 
    return { allowed: true };
  }
}

Room-Based Pub/Sub Architecture

When you scale to multiple server instances, each instance holds a subset of total connections. To send a message to a user or room, you need a pub/sub message broker that distributes messages across all instances. Redis Pub/Sub is the most common choice for this pattern.

Room Manager

class RoomManager {
  constructor() {
    this.rooms = new Map(); // roomId -> Map<connectionId, ws>
    this.connectionRooms = new Map(); // connectionId -> Set<roomId>
  }
 
  join(roomId, connectionId, ws) {
    if (!this.rooms.has(roomId)) {
      this.rooms.set(roomId, new Map());
    }
    this.rooms.get(roomId).set(connectionId, ws);
 
    if (!this.connectionRooms.has(connectionId)) {
      this.connectionRooms.set(connectionId, new Set());
    }
    this.connectionRooms.get(connectionId).add(roomId);
  }
 
  leave(roomId, connectionId) {
    const room = this.rooms.get(roomId);
    if (room) {
      room.delete(connectionId);
      if (room.size === 0) this.rooms.delete(roomId);
    }
 
    const rooms = this.connectionRooms.get(connectionId);
    if (rooms) {
      rooms.delete(roomId);
      if (rooms.size === 0) this.connectionRooms.delete(connectionId);
    }
  }
 
  leaveAll(connectionId) {
    const rooms = this.connectionRooms.get(connectionId);
    if (rooms) {
      for (const roomId of rooms) {
        this.leave(roomId, connectionId);
      }
    }
  }
 
  broadcastLocal(roomId, message, excludeConnectionId = null) {
    const room = this.rooms.get(roomId);
    if (!room) return 0;
 
    const payload = JSON.stringify(message);
    let sent = 0;
 
    for (const [connId, ws] of room) {
      if (connId !== excludeConnectionId && ws.readyState === 1) {
        ws.send(payload);
        sent++;
      }
    }
 
    return sent;
  }
 
  getRoomMembers(roomId) {
    const room = this.rooms.get(roomId);
    return room ? Array.from(room.keys()) : [];
  }
 
  get stats() {
    let totalMembers = 0;
    for (const room of this.rooms.values()) {
      totalMembers += room.size;
    }
    return { rooms: this.rooms.size, totalMembers };
  }
}

Scalable WebSocket Server with Redis Pub/Sub

This is the core of horizontal scaling: each server instance manages its own connections locally, but uses Redis to distribute messages across all instances.

const Redis = require('ioredis');
const WebSocket = require('ws');
const http = require('http');
 
class ScalableWebSocketServer {
  constructor(options) {
    this.serverId = options.serverId || `server_${process.pid}`;
    this.port = options.port;
    this.redisUrl = options.redisUrl || 'redis://localhost:6379';
 
    this.registry = new ConnectionRegistry();
    this.rooms = new RoomManager();
    this.heartbeat = new HeartbeatManager();
 
    this.publisher = new Redis(this.redisUrl);
    this.subscriber = new Redis(this.redisUrl);
 
    this.httpServer = http.createServer();
    this.wss = new WebSocket.Server({ server: this.httpServer });
 
    this.setupWebSocket();
    this.setupPubSub();
    this.heartbeat.start();
  }
 
  setupWebSocket() {
    this.wss.on('connection', (ws, req) => {
      const userId = this.authenticateConnection(req);
      if (!userId) {
        ws.close(4001, 'Unauthorized');
        return;
      }
 
      const connectionId = this.registry.add(userId, ws, {
        ip: req.socket.remoteAddress,
        userAgent: req.headers['user-agent'],
      });
 
      this.heartbeat.add(ws);
 
      ws.on('message', (data) => {
        this.handleMessage(ws, connectionId, userId, data);
      });
 
      ws.on('close', () => {
        this.rooms.leaveAll(connectionId);
        this.registry.remove(ws);
      });
    });
  }
 
  handleMessage(ws, connectionId, userId, data) {
    let message;
    try {
      message = JSON.parse(data);
    } catch (e) {
      ws.send(JSON.stringify({ error: 'Invalid JSON' }));
      return;
    }
 
    switch (message.type) {
      case 'join':
        this.handleJoin(connectionId, ws, userId, message.room);
        break;
      case 'leave':
        this.handleLeave(connectionId, message.room);
        break;
      case 'room_message':
        this.handleRoomMessage(userId, message.room, message.payload);
        break;
      case 'direct_message':
        this.handleDirectMessage(userId, message.targetUserId, message.payload);
        break;
      default:
        ws.send(JSON.stringify({ error: 'Unknown message type' }));
    }
  }
 
  handleJoin(connectionId, ws, userId, roomId) {
    this.rooms.join(roomId, connectionId, ws);
 
    // Notify all instances about the join
    this.publisher.publish('room:events', JSON.stringify({
      type: 'user_joined',
      roomId,
      userId,
      serverId: this.serverId,
    }));
  }
 
  handleLeave(connectionId, roomId) {
    this.rooms.leave(roomId, connectionId);
  }
 
  handleRoomMessage(userId, roomId, payload) {
    const message = {
      type: 'room_message',
      roomId,
      userId,
      payload,
      timestamp: Date.now(),
      serverId: this.serverId,
    };
 
    // Publish to Redis so all servers broadcast to their local room members
    this.publisher.publish(`room:${roomId}`, JSON.stringify(message));
  }
 
  handleDirectMessage(fromUserId, toUserId, payload) {
    // Try to deliver locally first
    const sent = this.registry.sendToUser(toUserId, {
      type: 'direct_message',
      fromUserId,
      payload,
      timestamp: Date.now(),
    });
 
    // If not found locally, publish for other servers to deliver
    if (sent === 0) {
      this.publisher.publish('user:message', JSON.stringify({
        type: 'direct_message',
        fromUserId,
        toUserId,
        payload,
        timestamp: Date.now(),
      }));
    }
  }
 
  setupPubSub() {
    // Subscribe to all room channels
    this.subscriber.psubscribe('room:*');
 
    this.subscriber.on('pmessage', (pattern, channel, data) => {
      if (channel === 'room:events') return;
 
      const roomId = channel.replace('room:', '');
      const message = JSON.parse(data);
 
      // Don't re-broadcast messages from this server
      if (message.serverId === this.serverId) return;
 
      this.rooms.broadcastLocal(roomId, message);
    });
 
    // Subscribe to direct message channel
    this.subscriber.subscribe('user:message');
 
    this.subscriber.on('message', (channel, data) => {
      if (channel === 'user:message') {
        const message = JSON.parse(data);
        if (message.serverId === this.serverId) return;
 
        this.registry.sendToUser(message.toUserId, {
          type: message.type,
          fromUserId: message.fromUserId,
          payload: message.payload,
          timestamp: message.timestamp,
        });
      }
    });
  }
 
  authenticateConnection(req) {
    // Extract and validate token from query string or headers
    const url = new URL(req.url, 'http://localhost');
    const token = url.searchParams.get('token');
    // Validate JWT or session token here
    return token ? `user_${token}` : null;
  }
 
  start() {
    this.httpServer.listen(this.port, () => {
      console.log(`WebSocket server ${this.serverId} listening on port ${this.port}`);
    });
  }
 
  async stop() {
    this.heartbeat.stop();
    await this.publisher.quit();
    await this.subscriber.quit();
 
    return new Promise((resolve) => {
      this.httpServer.close(resolve);
    });
  }
}

Load Balancing WebSocket Connections

WebSocket connections require special handling from load balancers. Unlike HTTP requests that can be routed to any backend, WebSocket connections are persistent—once established, all frames must travel to the same server. This requires sticky sessions or consistent hashing.

Nginx Configuration

# /etc/nginx/conf.d/websocket.conf
map $http_upgrade $connection_upgrade {
    default upgrade;
    ''      close;
}
 
upstream websocket_backend {
    # ip_hash provides sticky sessions based on client IP
    # Alternative: hash $request_uri consistent (for token-based routing)
    ip_hash;
 
    server 10.0.0.1:3001 max_fails=3 fail_timeout=30s;
    server 10.0.0.2:3002 max_fails=3 fail_timeout=30s;
    server 10.0.0.3:3003 max_fails=3 fail_timeout=30s;
    server 10.0.0.4:3004 max_fails=3 fail_timeout=30s;
 
    # Backup servers for failover
    server 10.0.0.5:3005 backup;
}
 
server {
    listen 443 ssl http2;
    server_name ws.example.com;
 
    ssl_certificate /etc/ssl/certs/ws.example.com.pem;
    ssl_certificate_key /etc/ssl/private/ws.example.com.key;
 
    location /ws {
        proxy_pass http://websocket_backend;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection $connection_upgrade;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
 
        # Keep connections alive for 24 hours
        proxy_read_timeout 86400s;
        proxy_send_timeout 86400s;
 
        # Buffer settings for WebSocket
        proxy_buffering off;
        proxy_cache off;
 
        # Limit request size
        client_max_body_size 1m;
    }
 
    # Health check endpoint
    location /health {
        proxy_pass http://websocket_backend;
        proxy_http_version 1.1;
    }
}

HAProxy Configuration

# /etc/haproxy/haproxy.cfg
frontend ws_frontend
    bind *:443 ssl crt /etc/ssl/certs/ws.pem
    mode http

    # Route WebSocket traffic
    acl is_websocket hdr(Upgrade) -i websocket
    use_backend ws_backend if is_websocket

    default_backend http_backend

backend ws_backend
    mode http
    balance source  # Sticky sessions by source IP

    # Health checks
    option httpchk GET /health
    http-check expect status 200

    # Timeout settings for long-lived connections
    timeout server 86400s
    timeout tunnel 86400s
    timeout connect 5s

    # Retry on connection failure
    retry-on conn-failure
    retries 3

    server ws1 10.0.0.1:3001 check inter 10s fall 3 rise 2
    server ws2 10.0.0.2:3002 check inter 10s fall 3 rise 2
    server ws3 10.0.0.3:3003 check inter 10s fall 3 rise 2
    server ws4 10.0.0.4:3004 check inter 10s fall 3 rise 2

backend http_backend
    mode http
    balance roundrobin
    server web1 10.0.0.10:80 check
    server web2 10.0.0.11:80 check

Token-Based Routing

When users reconnect after a disconnection, routing them to the same server can help restore session state quickly.

// Client-side: include a routing token in the WebSocket URL
class ReconnectingWebSocket {
  constructor(baseUrl, options = {}) {
    this.baseUrl = baseUrl;
    this.routingToken = options.routingToken || null;
    this.maxRetries = options.maxRetries || 5;
    this.retryCount = 0;
  }
 
  connect() {
    let url = this.baseUrl;
    if (this.routingToken) {
      url += `?routing=${this.routingToken}`;
    }
 
    this.ws = new WebSocket(url);
 
    this.ws.onopen = () => {
      this.retryCount = 0;
    };
 
    this.ws.onmessage = (event) => {
      const data = JSON.parse(event.data);
 
      // Server sends routing token on connection
      if (data.type === 'routing_token') {
        this.routingToken = data.token;
        localStorage.setItem('ws_routing_token', data.token);
      }
    };
 
    this.ws.onclose = () => {
      this.reconnect();
    };
  }
 
  reconnect() {
    if (this.retryCount >= this.maxRetries) {
      // Clear routing token and try any server
      this.routingToken = null;
      this.retryCount = 0;
    }
 
    const delay = Math.min(1000 * Math.pow(2, this.retryCount), 30000);
    this.retryCount++;
 
    setTimeout(() => this.connect(), delay);
  }
}

Graceful Shutdown and Connection Draining

When scaling down or deploying new code, you can't simply kill the server process—doing so drops all connections instantly, causing data loss and poor user experience. Graceful shutdown involves stopping new connections, notifying existing clients, and waiting for them to disconnect cleanly.

class GracefulShutdown {
  constructor(httpServer, wss, options = {}) {
    this.httpServer = httpServer;
    this.wss = wss;
    this.drainTimeout = options.drainTimeout || 30000; // 30 seconds
    this.notifyMessage = options.notifyMessage || {
      type: 'server_shutdown',
      reason: 'Deployment in progress',
      reconnectDelay: 5000,
    };
    this.isShuttingDown = false;
  }
 
  init() {
    process.on('SIGTERM', () => this.shutdown('SIGTERM'));
    process.on('SIGINT', () => this.shutdown('SIGINT'));
  }
 
  async shutdown(signal) {
    if (this.isShuttingDown) return;
    this.isShuttingDown = true;
 
    console.log(`Received ${signal}, starting graceful shutdown...`);
 
    // Step 1: Stop accepting new connections
    this.httpServer.close(() => {
      console.log('HTTP server closed, no new connections accepted');
    });
 
    // Step 2: Notify existing clients
    const clients = Array.from(this.wss.clients);
    console.log(`Notifying ${clients.length} connected clients...`);
 
    for (const ws of clients) {
      if (ws.readyState === WebSocket.OPEN) {
        ws.send(JSON.stringify(this.notifyMessage));
      }
    }
 
    // Step 3: Wait for clients to disconnect
    const startTime = Date.now();
 
    await new Promise((resolve) => {
      const checkInterval = setInterval(() => {
        const remaining = this.wss.clients.size;
        const elapsed = Date.now() - startTime;
 
        if (remaining === 0) {
          clearInterval(checkInterval);
          console.log('All clients disconnected gracefully');
          resolve();
        } else if (elapsed > this.drainTimeout) {
          clearInterval(checkInterval);
          console.log(`Drain timeout reached, force closing ${remaining} connections`);
 
          for (const ws of this.wss.clients) {
            ws.terminate();
          }
          resolve();
        } else {
          console.log(`Waiting for ${remaining} clients to disconnect (${Math.ceil((this.drainTimeout - elapsed) / 1000)}s remaining)`);
        }
      }, 2000);
    });
 
    // Step 4: Clean up resources
    this.wss.close();
    console.log('Graceful shutdown complete');
    process.exit(0);
  }
 
  // Middleware to reject new connections during shutdown
  middleware() {
    return (req, res, next) => {
      if (this.isShuttingDown) {
        res.writeHead(503, { 'Retry-After': '30' });
        res.end('Server is shutting down');
        return;
      }
      next();
    };
  }
}

Production Monitoring and Metrics

Monitoring WebSocket servers requires tracking connection-level metrics that don't exist in traditional HTTP applications. You need to know how many connections are active, how many messages are flowing, and whether connections are healthy.

const { Counter, Gauge, Histogram, register } = require('prom-client');
 
class WebSocketMetrics {
  constructor() {
    this.activeConnections = new Gauge({
      name: 'ws_active_connections',
      help: 'Number of active WebSocket connections',
      labelNames: ['server_id'],
    });
 
    this.connectionTotal = new Counter({
      name: 'ws_connections_total',
      help: 'Total WebSocket connections opened',
      labelNames: ['server_id'],
    });
 
    this.disconnectionTotal = new Counter({
      name: 'ws_disconnections_total',
      help: 'Total WebSocket disconnections',
      labelNames: ['server_id', 'reason'],
    });
 
    this.messagesReceived = new Counter({
      name: 'ws_messages_received_total',
      help: 'Total WebSocket messages received',
      labelNames: ['server_id', 'type'],
    });
 
    this.messagesSent = new Counter({
      name: 'ws_messages_sent_total',
      help: 'Total WebSocket messages sent',
      labelNames: ['server_id'],
    });
 
    this.messageSize = new Histogram({
      name: 'ws_message_size_bytes',
      help: 'WebSocket message size in bytes',
      labelNames: ['direction'],
      buckets: [64, 256, 1024, 4096, 16384, 65536],
    });
 
    this.connectionDuration = new Histogram({
      name: 'ws_connection_duration_seconds',
      help: 'WebSocket connection duration',
      buckets: [1, 10, 60, 300, 900, 3600, 86400],
    });
 
    this.heartbeatLatency = new Histogram({
      name: 'ws_heartbeat_latency_ms',
      help: 'Heartbeat round-trip latency',
      buckets: [1, 5, 10, 25, 50, 100, 250, 500, 1000],
    });
  }
 
  onConnect(serverId) {
    this.activeConnections.labels(serverId).inc();
    this.connectionTotal.labels(serverId).inc();
  }
 
  onDisconnect(serverId, reason, durationMs) {
    this.activeConnections.labels(serverId).dec();
    this.disconnectionTotal.labels(serverId, reason).inc();
    this.connectionDuration.observe(durationMs / 1000);
  }
 
  onMessageReceived(serverId, type, sizeBytes) {
    this.messagesReceived.labels(serverId, type).inc();
    this.messageSize.labels('received').observe(sizeBytes);
  }
 
  onMessageSent(serverId, sizeBytes) {
    this.messagesSent.labels(serverId).inc();
    this.messageSize.labels('sent').observe(sizeBytes);
  }
 
  async getMetrics() {
    return register.metrics();
  }
}

Dashboard Alerts

# prometheus-alerts.yml
groups:
  - name: websocket_alerts
    rules:
      - alert: HighConnectionCount
        expr: ws_active_connections > 40000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High WebSocket connection count on {{ $labels.server_id }}"
          description: "Server {{ $labels.server_id }} has {{ $value }} active connections"
 
      - alert: ConnectionChurn
        expr: rate(ws_disconnections_total[5m]) > 100
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "High connection churn rate"
          description: "{{ $value }} disconnections per second on {{ $labels.server_id }}"
 
      - alert: HighHeartbeatLatency
        expr: histogram_quantile(0.95, ws_heartbeat_latency_ms) > 500
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High heartbeat latency"
          description: "95th percentile heartbeat latency is {{ $value }}ms"

Common Pitfalls and Solutions

Pitfall	Impact	Solution
No heartbeat detection	Zombie connections accumulate, memory leak	Implement ping/pong with timeout
Broadcasting to all connections	O(n) CPU and network per message	Use room-based pub/sub with selective broadcast
No connection limits per IP	Single client exhausts server resources	Implement per-IP connection limits
Missing sticky sessions	Connections randomly drop after handshake	Configure ip_hash or hash-based routing in load balancer
No graceful shutdown	Data loss during deployments	Implement connection draining with client notification
Storing state only in memory	Lost on server restart or crash	Persist critical state to Redis or database
No rate limiting	Abuse and resource exhaustion	Implement per-connection message rate limits
Ignoring backpressure	Memory growth when clients are slow	Check `ws.bufferedAmount` before sending, implement flow control
Single Redis instance	Single point of failure for pub/sub	Use Redis Sentinel or Cluster for high availability
No connection authentication	Unauthorized access to real-time channels	Authenticate during WebSocket handshake upgrade

Best Practices Summary

Use pub/sub for multi-server deployments: Redis Pub/Sub or NATS for cross-server message distribution
Implement heartbeats: Detect and clean up dead connections every 30 seconds
Set per-IP connection limits: Prevent resource exhaustion from individual clients
Use binary protocols: MessagePack or Protocol Buffers reduce payload size by 30-50% compared to JSON
Monitor connection counts and message rates: Set up alerts for anomalies
Implement graceful shutdown: Drain connections before killing server processes
Use connection routing tokens: Route reconnecting clients to the same server for session continuity
Test with realistic load: Use tools like k6 or artillery to test with 100K+ concurrent connections
Persist critical state externally: Use Redis or a database for state that must survive server restarts
Implement backpressure handling: Check bufferedAmount and implement flow control for slow clients

Conclusion

Scaling WebSocket servers to millions of connections is a multi-layered challenge that spans connection management, pub/sub architecture, load balancing, graceful lifecycle management, and production monitoring. A single server handles 10K-50K connections well, but reaching millions requires horizontal scaling with a message broker like Redis Pub/Sub to distribute messages across server instances. Load balancers must be configured with sticky sessions to ensure WebSocket frames reach the correct server. Heartbeat mechanisms prevent resource leaks from zombie connections. Graceful shutdown with connection draining ensures zero-downtime deployments. And comprehensive monitoring with tools like Prometheus and Grafana gives you visibility into the health and performance of your real-time infrastructure.

The architecture described here has been battle-tested in production systems handling millions of concurrent connections for applications like live sports scores, financial trading platforms, multiplayer games, and large-scale chat systems. Start with a solid single-server implementation, add pub/sub when you need horizontal scaling, and invest in monitoring and graceful shutdown as you move toward production readiness. The key is building incrementally—each layer adds complexity, so only add the scaling layers you actually need.

Key takeaways:

Single servers handle 10K-50K connections before needing horizontal scaling
Redis Pub/Sub enables message distribution across server instances for rooms and direct messages
Sticky sessions are required for WebSocket load balancing—use ip_hash or consistent hashing
Heartbeats detect and clean up dead connections, preventing memory leaks
Graceful shutdown with connection draining ensures zero-downtime deployments
Prometheus metrics provide visibility into connection counts, message rates, and latency
Per-connection rate limiting prevents individual clients from exhausting server resources

Minh Vo

Slaying code & making it lit fr fr 🔥 tagline