MinhVo

Minh Vo

rss feed

Slaying code & making it lit fr fr 🔥 tagline

Hey there 👋 I'm an AI Engineer with 7 years of experience building scalable web and mobile applications. Currently at Neurond AI (May 2025 — present), architecting an Enterprise AI Assistant Platform with multi-tenant RAG on pgvector, multi-provider LLM orchestration, and Azure-native infrastructure. Previously spent 5+ years at SNAPTEC (Sep 2019 — Apr 2025), leading SaaS themes, admin dashboards, and e-commerce platforms — earned the Hero of the Year award in 2021. I specialize in TypeScript, React, Next.js, and AI-Native engineering with Claude Code and Cursor.bio

Back to blogs

WebSockets at Scale: Handling Millions of Connections

Scale WebSocket servers to millions of connections: connection management, pub/sub, horizontal scaling, load balancing, and production strategies.

WebSocketScaleReal-TimeArchitecture

By MinhVo

Introduction

Building a WebSocket server that handles a few hundred connections is straightforward. Building one that handles millions is an entirely different engineering challenge. Unlike stateless HTTP request-response cycles, WebSocket connections are persistent, stateful, and long-lived. Each connection consumes memory, file descriptors, and CPU cycles for as long as the client stays connected—sometimes hours or days. A single Node.js process can comfortably manage 10,000 to 50,000 concurrent WebSocket connections, but reaching millions requires a fundamentally different approach: horizontal scaling with pub/sub message brokers, intelligent load balancing with sticky sessions, robust connection lifecycle management, and comprehensive monitoring.

This guide covers every aspect of scaling WebSocket servers from thousands to millions of connections. We'll start with understanding the resource constraints of a single server, then build up through connection management patterns, pub/sub architectures for multi-server deployments, load balancing strategies, graceful shutdown procedures, and production monitoring. Each section includes working code examples you can adapt to your specific use case.

Scaling architecture

Understanding Single-Server Capacity

Before scaling horizontally, you need to understand the limits of a single server. Every WebSocket connection consumes several system resources: memory for the connection state and buffers, file descriptors from the operating system, and CPU time for message processing and serialization.

Memory Per Connection

Each WebSocket connection typically consumes between 2KB and 10KB of memory, depending on the library, message size, and application state stored per connection. The ws library for Node.js is one of the most memory-efficient implementations available.

// Estimating memory requirements for a WebSocket server
const CONNECTIONS = 50000;
const MEMORY_PER_CONNECTION = 5 * 1024; // 5KB average
const BUFFER_OVERHEAD = 2 * 1024; // 2KB for send/receive buffers
const TOTAL_MEMORY = CONNECTIONS * (MEMORY_PER_CONNECTION + BUFFER_OVERHEAD);
const TOTAL_MB = Math.ceil(TOTAL_MEMORY / (1024 * 1024));
 
console.log(`Estimated memory for ${CONNECTIONS} connections: ${TOTAL_MB}MB`);
// Output: Estimated memory for 50000 connections: 342MB
 
// Add overhead for the Node.js runtime itself (~100MB)
// Plus your application logic
// Recommended: at least 2GB RAM for 50K connections

File Descriptor Limits

Operating systems impose limits on the number of file descriptors a process can open. Each WebSocket connection requires one file descriptor. On Linux, check and increase the limit:

# Check current limit
ulimit -n
 
# Increase for current session (typically defaults to 1024)
ulimit -n 1000000
 
# Permanent change: edit /etc/security/limits.conf
# Add these lines:
# * soft nofile 1000000
# * hard nofile 1000000
 
# Also check and increase the system-wide limit
cat /proc/sys/fs/file-max
echo 2000000 > /proc/sys/fs/file-max

CPU Bottlenecks

CPU becomes a bottleneck when you're processing messages at high frequency. If each message takes 0.1ms to process, a single core can handle 10,000 messages per second. For a chat application with 100,000 connections sending an average of 1 message per second, you need at least 10 cores dedicated to message processing.

// Benchmarking message processing throughput
function benchmarkMessageProcessing(connections, messagesPerSecond) {
  const processingTimePerMessage = 0.1; // ms, measured via profiling
  const totalMessagesPerSecond = connections * messagesPerSecond;
  const requiredCpuMs = totalMessagesPerSecond * processingTimePerMessage;
  const availableCpuMs = 1000; // 1 second per core
 
  const coresNeeded = Math.ceil(requiredCpuMs / availableCpuMs);
 
  console.log(`Connections: ${connections}`);
  console.log(`Messages/sec: ${totalMessagesPerSecond}`);
  console.log(`CPU cores needed: ${coresNeeded}`);
}
 
benchmarkMessageProcessing(100000, 1); // 10 cores
benchmarkMessageProcessing(500000, 0.5); // 25 cores
benchmarkMessageProcessing(1000000, 0.1); // 10 cores (low frequency)

Bottleneck Analysis Matrix

ResourceSymptomThresholdMitigation
MemoryOOM crashes, slow GC~50K connections per 2GBHorizontal scaling, connection limits
File descriptorsEMFILE errorsOS default 1024Increase ulimit, tune kernel
CPUHigh latency, message drops80% sustained utilizationWorker threads, horizontal scaling
NetworkPacket loss, timeoutsNIC bandwidth saturationBinary protocols, compression
Event loopDelayed timers, stale connections>100ms loop lagOffload CPU work to worker threads

Server monitoring

Connection Management Patterns

Effective connection management is the foundation of a scalable WebSocket server. You need to track connections, associate them with users, handle multiple devices per user, and clean up resources when connections drop.

Connection Registry with Multi-Device Support

In production, a single user may have multiple connected devices—phone, laptop, tablet. The connection registry must map users to sets of connections, not just individual sockets.

class ConnectionRegistry {
  constructor() {
    // userId -> Map<connectionId, WebSocket>
    this.userConnections = new Map();
    // WebSocket -> connection metadata
    this.metadata = new WeakMap();
    // connectionId -> WebSocket (for quick lookup)
    this.connectionIndex = new Map();
    this.nextConnectionId = 0;
  }
 
  add(userId, ws, metadata = {}) {
    const connectionId = `conn_${++this.nextConnectionId}`;
 
    if (!this.userConnections.has(userId)) {
      this.userConnections.set(userId, new Map());
    }
 
    this.userConnections.get(userId).set(connectionId, ws);
    this.connectionIndex.set(connectionId, ws);
    this.metadata.set(ws, {
      userId,
      connectionId,
      connectedAt: Date.now(),
      lastActivity: Date.now(),
      ip: metadata.ip || 'unknown',
      userAgent: metadata.userAgent || 'unknown',
      ...metadata,
    });
 
    return connectionId;
  }
 
  remove(ws) {
    const meta = this.metadata.get(ws);
    if (!meta) return null;
 
    const userConns = this.userConnections.get(meta.userId);
    if (userConns) {
      userConns.delete(meta.connectionId);
      if (userConns.size === 0) {
        this.userConnections.delete(meta.userId);
      }
    }
 
    this.connectionIndex.delete(meta.connectionId);
    return meta;
  }
 
  getByUser(userId) {
    return this.userConnections.get(userId) || new Map();
  }
 
  getByConnectionId(connectionId) {
    return this.connectionIndex.get(connectionId) || null;
  }
 
  sendToUser(userId, message) {
    const conns = this.userConnections.get(userId);
    if (!conns) return 0;
 
    const payload = JSON.stringify(message);
    let sent = 0;
 
    for (const [connId, ws] of conns) {
      if (ws.readyState === 1) {
        ws.send(payload);
        sent++;
      }
    }
 
    return sent;
  }
 
  get stats() {
    let totalConnections = 0;
    for (const conns of this.userConnections.values()) {
      totalConnections += conns.size;
    }
    return {
      users: this.userConnections.size,
      connections: totalConnections,
    };
  }
}

Heartbeat System for Dead Connection Detection

Zombie connections—connections where the client has disappeared without closing the TCP socket—are a major resource leak. Without heartbeats, these connections accumulate indefinitely, eventually exhausting server resources. The WebSocket protocol defines ping/pong frames specifically for this purpose.

class HeartbeatManager {
  constructor(options = {}) {
    this.interval = options.interval || 30000; // 30 seconds
    this.timeout = options.timeout || 10000; // 10 seconds to respond
    this.clients = new Map(); // ws -> { lastPong, isAlive }
    this.timer = null;
    this.onTerminate = options.onTerminate || (() => {});
  }
 
  add(ws) {
    this.clients.set(ws, {
      lastPong: Date.now(),
      isAlive: true,
      pendingPing: false,
    });
 
    ws.on('pong', () => {
      const client = this.clients.get(ws);
      if (client) {
        client.lastPong = Date.now();
        client.isAlive = true;
        client.pendingPing = false;
      }
    });
 
    ws.on('close', () => {
      this.clients.delete(ws);
    });
  }
 
  start() {
    this.timer = setInterval(() => {
      const now = Date.now();
 
      for (const [ws, client] of this.clients) {
        if (client.pendingPing && (now - client.lastPong) > this.timeout) {
          // Client didn't respond to ping within timeout
          ws.terminate();
          this.clients.delete(ws);
          this.onTerminate(ws, 'heartbeat_timeout');
          continue;
        }
 
        if (ws.readyState === 1) {
          client.pendingPing = true;
          ws.ping();
        }
      }
    }, this.interval);
  }
 
  stop() {
    if (this.timer) {
      clearInterval(this.timer);
      this.timer = null;
    }
  }
 
  get stats() {
    let alive = 0;
    let pending = 0;
    for (const client of this.clients.values()) {
      if (client.pendingPing) pending++;
      else alive++;
    }
    return { total: this.clients.size, alive, pendingPing: pending };
  }
}

Rate Limiting Per Connection

Prevent individual connections from monopolizing server resources with per-connection rate limiting.

class ConnectionRateLimiter {
  constructor(options = {}) {
    this.maxMessagesPerSecond = options.maxMessagesPerSecond || 50;
    this.maxBytesPerSecond = options.maxBytesPerSecond || 1024 * 1024; // 1MB
    this.windows = new WeakMap(); // ws -> { messageCount, byteCount, windowStart }
  }
 
  check(ws, messageSize = 0) {
    let window = this.windows.get(ws);
 
    if (!window) {
      window = { messageCount: 0, byteCount: 0, windowStart: Date.now() };
      this.windows.set(ws, window);
    }
 
    const now = Date.now();
    if (now - window.windowStart > 1000) {
      window.messageCount = 0;
      window.byteCount = 0;
      window.windowStart = now;
    }
 
    window.messageCount++;
    window.byteCount += messageSize;
 
    if (window.messageCount > this.maxMessagesPerSecond) {
      return { allowed: false, reason: 'message_rate_exceeded' };
    }
 
    if (window.byteCount > this.maxBytesPerSecond) {
      return { allowed: false, reason: 'byte_rate_exceeded' };
    }
 
    return { allowed: true };
  }
}

Room-Based Pub/Sub Architecture

When you scale to multiple server instances, each instance holds a subset of total connections. To send a message to a user or room, you need a pub/sub message broker that distributes messages across all instances. Redis Pub/Sub is the most common choice for this pattern.

Room Manager

class RoomManager {
  constructor() {
    this.rooms = new Map(); // roomId -> Map<connectionId, ws>
    this.connectionRooms = new Map(); // connectionId -> Set<roomId>
  }
 
  join(roomId, connectionId, ws) {
    if (!this.rooms.has(roomId)) {
      this.rooms.set(roomId, new Map());
    }
    this.rooms.get(roomId).set(connectionId, ws);
 
    if (!this.connectionRooms.has(connectionId)) {
      this.connectionRooms.set(connectionId, new Set());
    }
    this.connectionRooms.get(connectionId).add(roomId);
  }
 
  leave(roomId, connectionId) {
    const room = this.rooms.get(roomId);
    if (room) {
      room.delete(connectionId);
      if (room.size === 0) this.rooms.delete(roomId);
    }
 
    const rooms = this.connectionRooms.get(connectionId);
    if (rooms) {
      rooms.delete(roomId);
      if (rooms.size === 0) this.connectionRooms.delete(connectionId);
    }
  }
 
  leaveAll(connectionId) {
    const rooms = this.connectionRooms.get(connectionId);
    if (rooms) {
      for (const roomId of rooms) {
        this.leave(roomId, connectionId);
      }
    }
  }
 
  broadcastLocal(roomId, message, excludeConnectionId = null) {
    const room = this.rooms.get(roomId);
    if (!room) return 0;
 
    const payload = JSON.stringify(message);
    let sent = 0;
 
    for (const [connId, ws] of room) {
      if (connId !== excludeConnectionId && ws.readyState === 1) {
        ws.send(payload);
        sent++;
      }
    }
 
    return sent;
  }
 
  getRoomMembers(roomId) {
    const room = this.rooms.get(roomId);
    return room ? Array.from(room.keys()) : [];
  }
 
  get stats() {
    let totalMembers = 0;
    for (const room of this.rooms.values()) {
      totalMembers += room.size;
    }
    return { rooms: this.rooms.size, totalMembers };
  }
}

Scalable WebSocket Server with Redis Pub/Sub

This is the core of horizontal scaling: each server instance manages its own connections locally, but uses Redis to distribute messages across all instances.

const Redis = require('ioredis');
const WebSocket = require('ws');
const http = require('http');
 
class ScalableWebSocketServer {
  constructor(options) {
    this.serverId = options.serverId || `server_${process.pid}`;
    this.port = options.port;
    this.redisUrl = options.redisUrl || 'redis://localhost:6379';
 
    this.registry = new ConnectionRegistry();
    this.rooms = new RoomManager();
    this.heartbeat = new HeartbeatManager();
 
    this.publisher = new Redis(this.redisUrl);
    this.subscriber = new Redis(this.redisUrl);
 
    this.httpServer = http.createServer();
    this.wss = new WebSocket.Server({ server: this.httpServer });
 
    this.setupWebSocket();
    this.setupPubSub();
    this.heartbeat.start();
  }
 
  setupWebSocket() {
    this.wss.on('connection', (ws, req) => {
      const userId = this.authenticateConnection(req);
      if (!userId) {
        ws.close(4001, 'Unauthorized');
        return;
      }
 
      const connectionId = this.registry.add(userId, ws, {
        ip: req.socket.remoteAddress,
        userAgent: req.headers['user-agent'],
      });
 
      this.heartbeat.add(ws);
 
      ws.on('message', (data) => {
        this.handleMessage(ws, connectionId, userId, data);
      });
 
      ws.on('close', () => {
        this.rooms.leaveAll(connectionId);
        this.registry.remove(ws);
      });
    });
  }
 
  handleMessage(ws, connectionId, userId, data) {
    let message;
    try {
      message = JSON.parse(data);
    } catch (e) {
      ws.send(JSON.stringify({ error: 'Invalid JSON' }));
      return;
    }
 
    switch (message.type) {
      case 'join':
        this.handleJoin(connectionId, ws, userId, message.room);
        break;
      case 'leave':
        this.handleLeave(connectionId, message.room);
        break;
      case 'room_message':
        this.handleRoomMessage(userId, message.room, message.payload);
        break;
      case 'direct_message':
        this.handleDirectMessage(userId, message.targetUserId, message.payload);
        break;
      default:
        ws.send(JSON.stringify({ error: 'Unknown message type' }));
    }
  }
 
  handleJoin(connectionId, ws, userId, roomId) {
    this.rooms.join(roomId, connectionId, ws);
 
    // Notify all instances about the join
    this.publisher.publish('room:events', JSON.stringify({
      type: 'user_joined',
      roomId,
      userId,
      serverId: this.serverId,
    }));
  }
 
  handleLeave(connectionId, roomId) {
    this.rooms.leave(roomId, connectionId);
  }
 
  handleRoomMessage(userId, roomId, payload) {
    const message = {
      type: 'room_message',
      roomId,
      userId,
      payload,
      timestamp: Date.now(),
      serverId: this.serverId,
    };
 
    // Publish to Redis so all servers broadcast to their local room members
    this.publisher.publish(`room:${roomId}`, JSON.stringify(message));
  }
 
  handleDirectMessage(fromUserId, toUserId, payload) {
    // Try to deliver locally first
    const sent = this.registry.sendToUser(toUserId, {
      type: 'direct_message',
      fromUserId,
      payload,
      timestamp: Date.now(),
    });
 
    // If not found locally, publish for other servers to deliver
    if (sent === 0) {
      this.publisher.publish('user:message', JSON.stringify({
        type: 'direct_message',
        fromUserId,
        toUserId,
        payload,
        timestamp: Date.now(),
      }));
    }
  }
 
  setupPubSub() {
    // Subscribe to all room channels
    this.subscriber.psubscribe('room:*');
 
    this.subscriber.on('pmessage', (pattern, channel, data) => {
      if (channel === 'room:events') return;
 
      const roomId = channel.replace('room:', '');
      const message = JSON.parse(data);
 
      // Don't re-broadcast messages from this server
      if (message.serverId === this.serverId) return;
 
      this.rooms.broadcastLocal(roomId, message);
    });
 
    // Subscribe to direct message channel
    this.subscriber.subscribe('user:message');
 
    this.subscriber.on('message', (channel, data) => {
      if (channel === 'user:message') {
        const message = JSON.parse(data);
        if (message.serverId === this.serverId) return;
 
        this.registry.sendToUser(message.toUserId, {
          type: message.type,
          fromUserId: message.fromUserId,
          payload: message.payload,
          timestamp: message.timestamp,
        });
      }
    });
  }
 
  authenticateConnection(req) {
    // Extract and validate token from query string or headers
    const url = new URL(req.url, 'http://localhost');
    const token = url.searchParams.get('token');
    // Validate JWT or session token here
    return token ? `user_${token}` : null;
  }
 
  start() {
    this.httpServer.listen(this.port, () => {
      console.log(`WebSocket server ${this.serverId} listening on port ${this.port}`);
    });
  }
 
  async stop() {
    this.heartbeat.stop();
    await this.publisher.quit();
    await this.subscriber.quit();
 
    return new Promise((resolve) => {
      this.httpServer.close(resolve);
    });
  }
}

Load balancing

Load Balancing WebSocket Connections

WebSocket connections require special handling from load balancers. Unlike HTTP requests that can be routed to any backend, WebSocket connections are persistent—once established, all frames must travel to the same server. This requires sticky sessions or consistent hashing.

Nginx Configuration

# /etc/nginx/conf.d/websocket.conf
map $http_upgrade $connection_upgrade {
    default upgrade;
    ''      close;
}
 
upstream websocket_backend {
    # ip_hash provides sticky sessions based on client IP
    # Alternative: hash $request_uri consistent (for token-based routing)
    ip_hash;
 
    server 10.0.0.1:3001 max_fails=3 fail_timeout=30s;
    server 10.0.0.2:3002 max_fails=3 fail_timeout=30s;
    server 10.0.0.3:3003 max_fails=3 fail_timeout=30s;
    server 10.0.0.4:3004 max_fails=3 fail_timeout=30s;
 
    # Backup servers for failover
    server 10.0.0.5:3005 backup;
}
 
server {
    listen 443 ssl http2;
    server_name ws.example.com;
 
    ssl_certificate /etc/ssl/certs/ws.example.com.pem;
    ssl_certificate_key /etc/ssl/private/ws.example.com.key;
 
    location /ws {
        proxy_pass http://websocket_backend;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection $connection_upgrade;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
 
        # Keep connections alive for 24 hours
        proxy_read_timeout 86400s;
        proxy_send_timeout 86400s;
 
        # Buffer settings for WebSocket
        proxy_buffering off;
        proxy_cache off;
 
        # Limit request size
        client_max_body_size 1m;
    }
 
    # Health check endpoint
    location /health {
        proxy_pass http://websocket_backend;
        proxy_http_version 1.1;
    }
}

HAProxy Configuration

# /etc/haproxy/haproxy.cfg
frontend ws_frontend
    bind *:443 ssl crt /etc/ssl/certs/ws.pem
    mode http

    # Route WebSocket traffic
    acl is_websocket hdr(Upgrade) -i websocket
    use_backend ws_backend if is_websocket

    default_backend http_backend

backend ws_backend
    mode http
    balance source  # Sticky sessions by source IP

    # Health checks
    option httpchk GET /health
    http-check expect status 200

    # Timeout settings for long-lived connections
    timeout server 86400s
    timeout tunnel 86400s
    timeout connect 5s

    # Retry on connection failure
    retry-on conn-failure
    retries 3

    server ws1 10.0.0.1:3001 check inter 10s fall 3 rise 2
    server ws2 10.0.0.2:3002 check inter 10s fall 3 rise 2
    server ws3 10.0.0.3:3003 check inter 10s fall 3 rise 2
    server ws4 10.0.0.4:3004 check inter 10s fall 3 rise 2

backend http_backend
    mode http
    balance roundrobin
    server web1 10.0.0.10:80 check
    server web2 10.0.0.11:80 check

Token-Based Routing

When users reconnect after a disconnection, routing them to the same server can help restore session state quickly.

// Client-side: include a routing token in the WebSocket URL
class ReconnectingWebSocket {
  constructor(baseUrl, options = {}) {
    this.baseUrl = baseUrl;
    this.routingToken = options.routingToken || null;
    this.maxRetries = options.maxRetries || 5;
    this.retryCount = 0;
  }
 
  connect() {
    let url = this.baseUrl;
    if (this.routingToken) {
      url += `?routing=${this.routingToken}`;
    }
 
    this.ws = new WebSocket(url);
 
    this.ws.onopen = () => {
      this.retryCount = 0;
    };
 
    this.ws.onmessage = (event) => {
      const data = JSON.parse(event.data);
 
      // Server sends routing token on connection
      if (data.type === 'routing_token') {
        this.routingToken = data.token;
        localStorage.setItem('ws_routing_token', data.token);
      }
    };
 
    this.ws.onclose = () => {
      this.reconnect();
    };
  }
 
  reconnect() {
    if (this.retryCount >= this.maxRetries) {
      // Clear routing token and try any server
      this.routingToken = null;
      this.retryCount = 0;
    }
 
    const delay = Math.min(1000 * Math.pow(2, this.retryCount), 30000);
    this.retryCount++;
 
    setTimeout(() => this.connect(), delay);
  }
}

Graceful Shutdown and Connection Draining

When scaling down or deploying new code, you can't simply kill the server process—doing so drops all connections instantly, causing data loss and poor user experience. Graceful shutdown involves stopping new connections, notifying existing clients, and waiting for them to disconnect cleanly.

class GracefulShutdown {
  constructor(httpServer, wss, options = {}) {
    this.httpServer = httpServer;
    this.wss = wss;
    this.drainTimeout = options.drainTimeout || 30000; // 30 seconds
    this.notifyMessage = options.notifyMessage || {
      type: 'server_shutdown',
      reason: 'Deployment in progress',
      reconnectDelay: 5000,
    };
    this.isShuttingDown = false;
  }
 
  init() {
    process.on('SIGTERM', () => this.shutdown('SIGTERM'));
    process.on('SIGINT', () => this.shutdown('SIGINT'));
  }
 
  async shutdown(signal) {
    if (this.isShuttingDown) return;
    this.isShuttingDown = true;
 
    console.log(`Received ${signal}, starting graceful shutdown...`);
 
    // Step 1: Stop accepting new connections
    this.httpServer.close(() => {
      console.log('HTTP server closed, no new connections accepted');
    });
 
    // Step 2: Notify existing clients
    const clients = Array.from(this.wss.clients);
    console.log(`Notifying ${clients.length} connected clients...`);
 
    for (const ws of clients) {
      if (ws.readyState === WebSocket.OPEN) {
        ws.send(JSON.stringify(this.notifyMessage));
      }
    }
 
    // Step 3: Wait for clients to disconnect
    const startTime = Date.now();
 
    await new Promise((resolve) => {
      const checkInterval = setInterval(() => {
        const remaining = this.wss.clients.size;
        const elapsed = Date.now() - startTime;
 
        if (remaining === 0) {
          clearInterval(checkInterval);
          console.log('All clients disconnected gracefully');
          resolve();
        } else if (elapsed > this.drainTimeout) {
          clearInterval(checkInterval);
          console.log(`Drain timeout reached, force closing ${remaining} connections`);
 
          for (const ws of this.wss.clients) {
            ws.terminate();
          }
          resolve();
        } else {
          console.log(`Waiting for ${remaining} clients to disconnect (${Math.ceil((this.drainTimeout - elapsed) / 1000)}s remaining)`);
        }
      }, 2000);
    });
 
    // Step 4: Clean up resources
    this.wss.close();
    console.log('Graceful shutdown complete');
    process.exit(0);
  }
 
  // Middleware to reject new connections during shutdown
  middleware() {
    return (req, res, next) => {
      if (this.isShuttingDown) {
        res.writeHead(503, { 'Retry-After': '30' });
        res.end('Server is shutting down');
        return;
      }
      next();
    };
  }
}

Production Monitoring and Metrics

Monitoring WebSocket servers requires tracking connection-level metrics that don't exist in traditional HTTP applications. You need to know how many connections are active, how many messages are flowing, and whether connections are healthy.

const { Counter, Gauge, Histogram, register } = require('prom-client');
 
class WebSocketMetrics {
  constructor() {
    this.activeConnections = new Gauge({
      name: 'ws_active_connections',
      help: 'Number of active WebSocket connections',
      labelNames: ['server_id'],
    });
 
    this.connectionTotal = new Counter({
      name: 'ws_connections_total',
      help: 'Total WebSocket connections opened',
      labelNames: ['server_id'],
    });
 
    this.disconnectionTotal = new Counter({
      name: 'ws_disconnections_total',
      help: 'Total WebSocket disconnections',
      labelNames: ['server_id', 'reason'],
    });
 
    this.messagesReceived = new Counter({
      name: 'ws_messages_received_total',
      help: 'Total WebSocket messages received',
      labelNames: ['server_id', 'type'],
    });
 
    this.messagesSent = new Counter({
      name: 'ws_messages_sent_total',
      help: 'Total WebSocket messages sent',
      labelNames: ['server_id'],
    });
 
    this.messageSize = new Histogram({
      name: 'ws_message_size_bytes',
      help: 'WebSocket message size in bytes',
      labelNames: ['direction'],
      buckets: [64, 256, 1024, 4096, 16384, 65536],
    });
 
    this.connectionDuration = new Histogram({
      name: 'ws_connection_duration_seconds',
      help: 'WebSocket connection duration',
      buckets: [1, 10, 60, 300, 900, 3600, 86400],
    });
 
    this.heartbeatLatency = new Histogram({
      name: 'ws_heartbeat_latency_ms',
      help: 'Heartbeat round-trip latency',
      buckets: [1, 5, 10, 25, 50, 100, 250, 500, 1000],
    });
  }
 
  onConnect(serverId) {
    this.activeConnections.labels(serverId).inc();
    this.connectionTotal.labels(serverId).inc();
  }
 
  onDisconnect(serverId, reason, durationMs) {
    this.activeConnections.labels(serverId).dec();
    this.disconnectionTotal.labels(serverId, reason).inc();
    this.connectionDuration.observe(durationMs / 1000);
  }
 
  onMessageReceived(serverId, type, sizeBytes) {
    this.messagesReceived.labels(serverId, type).inc();
    this.messageSize.labels('received').observe(sizeBytes);
  }
 
  onMessageSent(serverId, sizeBytes) {
    this.messagesSent.labels(serverId).inc();
    this.messageSize.labels('sent').observe(sizeBytes);
  }
 
  async getMetrics() {
    return register.metrics();
  }
}

Dashboard Alerts

# prometheus-alerts.yml
groups:
  - name: websocket_alerts
    rules:
      - alert: HighConnectionCount
        expr: ws_active_connections > 40000
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High WebSocket connection count on {{ $labels.server_id }}"
          description: "Server {{ $labels.server_id }} has {{ $value }} active connections"
 
      - alert: ConnectionChurn
        expr: rate(ws_disconnections_total[5m]) > 100
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "High connection churn rate"
          description: "{{ $value }} disconnections per second on {{ $labels.server_id }}"
 
      - alert: HighHeartbeatLatency
        expr: histogram_quantile(0.95, ws_heartbeat_latency_ms) > 500
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High heartbeat latency"
          description: "95th percentile heartbeat latency is {{ $value }}ms"

Common Pitfalls and Solutions

PitfallImpactSolution
No heartbeat detectionZombie connections accumulate, memory leakImplement ping/pong with timeout
Broadcasting to all connectionsO(n) CPU and network per messageUse room-based pub/sub with selective broadcast
No connection limits per IPSingle client exhausts server resourcesImplement per-IP connection limits
Missing sticky sessionsConnections randomly drop after handshakeConfigure ip_hash or hash-based routing in load balancer
No graceful shutdownData loss during deploymentsImplement connection draining with client notification
Storing state only in memoryLost on server restart or crashPersist critical state to Redis or database
No rate limitingAbuse and resource exhaustionImplement per-connection message rate limits
Ignoring backpressureMemory growth when clients are slowCheck ws.bufferedAmount before sending, implement flow control
Single Redis instanceSingle point of failure for pub/subUse Redis Sentinel or Cluster for high availability
No connection authenticationUnauthorized access to real-time channelsAuthenticate during WebSocket handshake upgrade

Best Practices Summary

  1. Use pub/sub for multi-server deployments: Redis Pub/Sub or NATS for cross-server message distribution
  2. Implement heartbeats: Detect and clean up dead connections every 30 seconds
  3. Set per-IP connection limits: Prevent resource exhaustion from individual clients
  4. Use binary protocols: MessagePack or Protocol Buffers reduce payload size by 30-50% compared to JSON
  5. Monitor connection counts and message rates: Set up alerts for anomalies
  6. Implement graceful shutdown: Drain connections before killing server processes
  7. Use connection routing tokens: Route reconnecting clients to the same server for session continuity
  8. Test with realistic load: Use tools like k6 or artillery to test with 100K+ concurrent connections
  9. Persist critical state externally: Use Redis or a database for state that must survive server restarts
  10. Implement backpressure handling: Check bufferedAmount and implement flow control for slow clients

Conclusion

Scaling WebSocket servers to millions of connections is a multi-layered challenge that spans connection management, pub/sub architecture, load balancing, graceful lifecycle management, and production monitoring. A single server handles 10K-50K connections well, but reaching millions requires horizontal scaling with a message broker like Redis Pub/Sub to distribute messages across server instances. Load balancers must be configured with sticky sessions to ensure WebSocket frames reach the correct server. Heartbeat mechanisms prevent resource leaks from zombie connections. Graceful shutdown with connection draining ensures zero-downtime deployments. And comprehensive monitoring with tools like Prometheus and Grafana gives you visibility into the health and performance of your real-time infrastructure.

The architecture described here has been battle-tested in production systems handling millions of concurrent connections for applications like live sports scores, financial trading platforms, multiplayer games, and large-scale chat systems. Start with a solid single-server implementation, add pub/sub when you need horizontal scaling, and invest in monitoring and graceful shutdown as you move toward production readiness. The key is building incrementally—each layer adds complexity, so only add the scaling layers you actually need.

Key takeaways:

  1. Single servers handle 10K-50K connections before needing horizontal scaling
  2. Redis Pub/Sub enables message distribution across server instances for rooms and direct messages
  3. Sticky sessions are required for WebSocket load balancing—use ip_hash or consistent hashing
  4. Heartbeats detect and clean up dead connections, preventing memory leaks
  5. Graceful shutdown with connection draining ensures zero-downtime deployments
  6. Prometheus metrics provide visibility into connection counts, message rates, and latency
  7. Per-connection rate limiting prevents individual clients from exhausting server resources