Introduction
Building a WebSocket server that handles a few hundred connections is straightforward. Building one that handles millions is an entirely different engineering challenge. Unlike stateless HTTP request-response cycles, WebSocket connections are persistent, stateful, and long-lived. Each connection consumes memory, file descriptors, and CPU cycles for as long as the client stays connected—sometimes hours or days. A single Node.js process can comfortably manage 10,000 to 50,000 concurrent WebSocket connections, but reaching millions requires a fundamentally different approach: horizontal scaling with pub/sub message brokers, intelligent load balancing with sticky sessions, robust connection lifecycle management, and comprehensive monitoring.
This guide covers every aspect of scaling WebSocket servers from thousands to millions of connections. We'll start with understanding the resource constraints of a single server, then build up through connection management patterns, pub/sub architectures for multi-server deployments, load balancing strategies, graceful shutdown procedures, and production monitoring. Each section includes working code examples you can adapt to your specific use case.
Understanding Single-Server Capacity
Before scaling horizontally, you need to understand the limits of a single server. Every WebSocket connection consumes several system resources: memory for the connection state and buffers, file descriptors from the operating system, and CPU time for message processing and serialization.
Memory Per Connection
Each WebSocket connection typically consumes between 2KB and 10KB of memory, depending on the library, message size, and application state stored per connection. The ws library for Node.js is one of the most memory-efficient implementations available.
// Estimating memory requirements for a WebSocket server
const CONNECTIONS = 50000;
const MEMORY_PER_CONNECTION = 5 * 1024; // 5KB average
const BUFFER_OVERHEAD = 2 * 1024; // 2KB for send/receive buffers
const TOTAL_MEMORY = CONNECTIONS * (MEMORY_PER_CONNECTION + BUFFER_OVERHEAD);
const TOTAL_MB = Math.ceil(TOTAL_MEMORY / (1024 * 1024));
console.log(`Estimated memory for ${CONNECTIONS} connections: ${TOTAL_MB}MB`);
// Output: Estimated memory for 50000 connections: 342MB
// Add overhead for the Node.js runtime itself (~100MB)
// Plus your application logic
// Recommended: at least 2GB RAM for 50K connectionsFile Descriptor Limits
Operating systems impose limits on the number of file descriptors a process can open. Each WebSocket connection requires one file descriptor. On Linux, check and increase the limit:
# Check current limit
ulimit -n
# Increase for current session (typically defaults to 1024)
ulimit -n 1000000
# Permanent change: edit /etc/security/limits.conf
# Add these lines:
# * soft nofile 1000000
# * hard nofile 1000000
# Also check and increase the system-wide limit
cat /proc/sys/fs/file-max
echo 2000000 > /proc/sys/fs/file-maxCPU Bottlenecks
CPU becomes a bottleneck when you're processing messages at high frequency. If each message takes 0.1ms to process, a single core can handle 10,000 messages per second. For a chat application with 100,000 connections sending an average of 1 message per second, you need at least 10 cores dedicated to message processing.
// Benchmarking message processing throughput
function benchmarkMessageProcessing(connections, messagesPerSecond) {
const processingTimePerMessage = 0.1; // ms, measured via profiling
const totalMessagesPerSecond = connections * messagesPerSecond;
const requiredCpuMs = totalMessagesPerSecond * processingTimePerMessage;
const availableCpuMs = 1000; // 1 second per core
const coresNeeded = Math.ceil(requiredCpuMs / availableCpuMs);
console.log(`Connections: ${connections}`);
console.log(`Messages/sec: ${totalMessagesPerSecond}`);
console.log(`CPU cores needed: ${coresNeeded}`);
}
benchmarkMessageProcessing(100000, 1); // 10 cores
benchmarkMessageProcessing(500000, 0.5); // 25 cores
benchmarkMessageProcessing(1000000, 0.1); // 10 cores (low frequency)Bottleneck Analysis Matrix
| Resource | Symptom | Threshold | Mitigation |
|---|---|---|---|
| Memory | OOM crashes, slow GC | ~50K connections per 2GB | Horizontal scaling, connection limits |
| File descriptors | EMFILE errors | OS default 1024 | Increase ulimit, tune kernel |
| CPU | High latency, message drops | 80% sustained utilization | Worker threads, horizontal scaling |
| Network | Packet loss, timeouts | NIC bandwidth saturation | Binary protocols, compression |
| Event loop | Delayed timers, stale connections | >100ms loop lag | Offload CPU work to worker threads |
Connection Management Patterns
Effective connection management is the foundation of a scalable WebSocket server. You need to track connections, associate them with users, handle multiple devices per user, and clean up resources when connections drop.
Connection Registry with Multi-Device Support
In production, a single user may have multiple connected devices—phone, laptop, tablet. The connection registry must map users to sets of connections, not just individual sockets.
class ConnectionRegistry {
constructor() {
// userId -> Map<connectionId, WebSocket>
this.userConnections = new Map();
// WebSocket -> connection metadata
this.metadata = new WeakMap();
// connectionId -> WebSocket (for quick lookup)
this.connectionIndex = new Map();
this.nextConnectionId = 0;
}
add(userId, ws, metadata = {}) {
const connectionId = `conn_${++this.nextConnectionId}`;
if (!this.userConnections.has(userId)) {
this.userConnections.set(userId, new Map());
}
this.userConnections.get(userId).set(connectionId, ws);
this.connectionIndex.set(connectionId, ws);
this.metadata.set(ws, {
userId,
connectionId,
connectedAt: Date.now(),
lastActivity: Date.now(),
ip: metadata.ip || 'unknown',
userAgent: metadata.userAgent || 'unknown',
...metadata,
});
return connectionId;
}
remove(ws) {
const meta = this.metadata.get(ws);
if (!meta) return null;
const userConns = this.userConnections.get(meta.userId);
if (userConns) {
userConns.delete(meta.connectionId);
if (userConns.size === 0) {
this.userConnections.delete(meta.userId);
}
}
this.connectionIndex.delete(meta.connectionId);
return meta;
}
getByUser(userId) {
return this.userConnections.get(userId) || new Map();
}
getByConnectionId(connectionId) {
return this.connectionIndex.get(connectionId) || null;
}
sendToUser(userId, message) {
const conns = this.userConnections.get(userId);
if (!conns) return 0;
const payload = JSON.stringify(message);
let sent = 0;
for (const [connId, ws] of conns) {
if (ws.readyState === 1) {
ws.send(payload);
sent++;
}
}
return sent;
}
get stats() {
let totalConnections = 0;
for (const conns of this.userConnections.values()) {
totalConnections += conns.size;
}
return {
users: this.userConnections.size,
connections: totalConnections,
};
}
}Heartbeat System for Dead Connection Detection
Zombie connections—connections where the client has disappeared without closing the TCP socket—are a major resource leak. Without heartbeats, these connections accumulate indefinitely, eventually exhausting server resources. The WebSocket protocol defines ping/pong frames specifically for this purpose.
class HeartbeatManager {
constructor(options = {}) {
this.interval = options.interval || 30000; // 30 seconds
this.timeout = options.timeout || 10000; // 10 seconds to respond
this.clients = new Map(); // ws -> { lastPong, isAlive }
this.timer = null;
this.onTerminate = options.onTerminate || (() => {});
}
add(ws) {
this.clients.set(ws, {
lastPong: Date.now(),
isAlive: true,
pendingPing: false,
});
ws.on('pong', () => {
const client = this.clients.get(ws);
if (client) {
client.lastPong = Date.now();
client.isAlive = true;
client.pendingPing = false;
}
});
ws.on('close', () => {
this.clients.delete(ws);
});
}
start() {
this.timer = setInterval(() => {
const now = Date.now();
for (const [ws, client] of this.clients) {
if (client.pendingPing && (now - client.lastPong) > this.timeout) {
// Client didn't respond to ping within timeout
ws.terminate();
this.clients.delete(ws);
this.onTerminate(ws, 'heartbeat_timeout');
continue;
}
if (ws.readyState === 1) {
client.pendingPing = true;
ws.ping();
}
}
}, this.interval);
}
stop() {
if (this.timer) {
clearInterval(this.timer);
this.timer = null;
}
}
get stats() {
let alive = 0;
let pending = 0;
for (const client of this.clients.values()) {
if (client.pendingPing) pending++;
else alive++;
}
return { total: this.clients.size, alive, pendingPing: pending };
}
}Rate Limiting Per Connection
Prevent individual connections from monopolizing server resources with per-connection rate limiting.
class ConnectionRateLimiter {
constructor(options = {}) {
this.maxMessagesPerSecond = options.maxMessagesPerSecond || 50;
this.maxBytesPerSecond = options.maxBytesPerSecond || 1024 * 1024; // 1MB
this.windows = new WeakMap(); // ws -> { messageCount, byteCount, windowStart }
}
check(ws, messageSize = 0) {
let window = this.windows.get(ws);
if (!window) {
window = { messageCount: 0, byteCount: 0, windowStart: Date.now() };
this.windows.set(ws, window);
}
const now = Date.now();
if (now - window.windowStart > 1000) {
window.messageCount = 0;
window.byteCount = 0;
window.windowStart = now;
}
window.messageCount++;
window.byteCount += messageSize;
if (window.messageCount > this.maxMessagesPerSecond) {
return { allowed: false, reason: 'message_rate_exceeded' };
}
if (window.byteCount > this.maxBytesPerSecond) {
return { allowed: false, reason: 'byte_rate_exceeded' };
}
return { allowed: true };
}
}Room-Based Pub/Sub Architecture
When you scale to multiple server instances, each instance holds a subset of total connections. To send a message to a user or room, you need a pub/sub message broker that distributes messages across all instances. Redis Pub/Sub is the most common choice for this pattern.
Room Manager
class RoomManager {
constructor() {
this.rooms = new Map(); // roomId -> Map<connectionId, ws>
this.connectionRooms = new Map(); // connectionId -> Set<roomId>
}
join(roomId, connectionId, ws) {
if (!this.rooms.has(roomId)) {
this.rooms.set(roomId, new Map());
}
this.rooms.get(roomId).set(connectionId, ws);
if (!this.connectionRooms.has(connectionId)) {
this.connectionRooms.set(connectionId, new Set());
}
this.connectionRooms.get(connectionId).add(roomId);
}
leave(roomId, connectionId) {
const room = this.rooms.get(roomId);
if (room) {
room.delete(connectionId);
if (room.size === 0) this.rooms.delete(roomId);
}
const rooms = this.connectionRooms.get(connectionId);
if (rooms) {
rooms.delete(roomId);
if (rooms.size === 0) this.connectionRooms.delete(connectionId);
}
}
leaveAll(connectionId) {
const rooms = this.connectionRooms.get(connectionId);
if (rooms) {
for (const roomId of rooms) {
this.leave(roomId, connectionId);
}
}
}
broadcastLocal(roomId, message, excludeConnectionId = null) {
const room = this.rooms.get(roomId);
if (!room) return 0;
const payload = JSON.stringify(message);
let sent = 0;
for (const [connId, ws] of room) {
if (connId !== excludeConnectionId && ws.readyState === 1) {
ws.send(payload);
sent++;
}
}
return sent;
}
getRoomMembers(roomId) {
const room = this.rooms.get(roomId);
return room ? Array.from(room.keys()) : [];
}
get stats() {
let totalMembers = 0;
for (const room of this.rooms.values()) {
totalMembers += room.size;
}
return { rooms: this.rooms.size, totalMembers };
}
}Scalable WebSocket Server with Redis Pub/Sub
This is the core of horizontal scaling: each server instance manages its own connections locally, but uses Redis to distribute messages across all instances.
const Redis = require('ioredis');
const WebSocket = require('ws');
const http = require('http');
class ScalableWebSocketServer {
constructor(options) {
this.serverId = options.serverId || `server_${process.pid}`;
this.port = options.port;
this.redisUrl = options.redisUrl || 'redis://localhost:6379';
this.registry = new ConnectionRegistry();
this.rooms = new RoomManager();
this.heartbeat = new HeartbeatManager();
this.publisher = new Redis(this.redisUrl);
this.subscriber = new Redis(this.redisUrl);
this.httpServer = http.createServer();
this.wss = new WebSocket.Server({ server: this.httpServer });
this.setupWebSocket();
this.setupPubSub();
this.heartbeat.start();
}
setupWebSocket() {
this.wss.on('connection', (ws, req) => {
const userId = this.authenticateConnection(req);
if (!userId) {
ws.close(4001, 'Unauthorized');
return;
}
const connectionId = this.registry.add(userId, ws, {
ip: req.socket.remoteAddress,
userAgent: req.headers['user-agent'],
});
this.heartbeat.add(ws);
ws.on('message', (data) => {
this.handleMessage(ws, connectionId, userId, data);
});
ws.on('close', () => {
this.rooms.leaveAll(connectionId);
this.registry.remove(ws);
});
});
}
handleMessage(ws, connectionId, userId, data) {
let message;
try {
message = JSON.parse(data);
} catch (e) {
ws.send(JSON.stringify({ error: 'Invalid JSON' }));
return;
}
switch (message.type) {
case 'join':
this.handleJoin(connectionId, ws, userId, message.room);
break;
case 'leave':
this.handleLeave(connectionId, message.room);
break;
case 'room_message':
this.handleRoomMessage(userId, message.room, message.payload);
break;
case 'direct_message':
this.handleDirectMessage(userId, message.targetUserId, message.payload);
break;
default:
ws.send(JSON.stringify({ error: 'Unknown message type' }));
}
}
handleJoin(connectionId, ws, userId, roomId) {
this.rooms.join(roomId, connectionId, ws);
// Notify all instances about the join
this.publisher.publish('room:events', JSON.stringify({
type: 'user_joined',
roomId,
userId,
serverId: this.serverId,
}));
}
handleLeave(connectionId, roomId) {
this.rooms.leave(roomId, connectionId);
}
handleRoomMessage(userId, roomId, payload) {
const message = {
type: 'room_message',
roomId,
userId,
payload,
timestamp: Date.now(),
serverId: this.serverId,
};
// Publish to Redis so all servers broadcast to their local room members
this.publisher.publish(`room:${roomId}`, JSON.stringify(message));
}
handleDirectMessage(fromUserId, toUserId, payload) {
// Try to deliver locally first
const sent = this.registry.sendToUser(toUserId, {
type: 'direct_message',
fromUserId,
payload,
timestamp: Date.now(),
});
// If not found locally, publish for other servers to deliver
if (sent === 0) {
this.publisher.publish('user:message', JSON.stringify({
type: 'direct_message',
fromUserId,
toUserId,
payload,
timestamp: Date.now(),
}));
}
}
setupPubSub() {
// Subscribe to all room channels
this.subscriber.psubscribe('room:*');
this.subscriber.on('pmessage', (pattern, channel, data) => {
if (channel === 'room:events') return;
const roomId = channel.replace('room:', '');
const message = JSON.parse(data);
// Don't re-broadcast messages from this server
if (message.serverId === this.serverId) return;
this.rooms.broadcastLocal(roomId, message);
});
// Subscribe to direct message channel
this.subscriber.subscribe('user:message');
this.subscriber.on('message', (channel, data) => {
if (channel === 'user:message') {
const message = JSON.parse(data);
if (message.serverId === this.serverId) return;
this.registry.sendToUser(message.toUserId, {
type: message.type,
fromUserId: message.fromUserId,
payload: message.payload,
timestamp: message.timestamp,
});
}
});
}
authenticateConnection(req) {
// Extract and validate token from query string or headers
const url = new URL(req.url, 'http://localhost');
const token = url.searchParams.get('token');
// Validate JWT or session token here
return token ? `user_${token}` : null;
}
start() {
this.httpServer.listen(this.port, () => {
console.log(`WebSocket server ${this.serverId} listening on port ${this.port}`);
});
}
async stop() {
this.heartbeat.stop();
await this.publisher.quit();
await this.subscriber.quit();
return new Promise((resolve) => {
this.httpServer.close(resolve);
});
}
}Load Balancing WebSocket Connections
WebSocket connections require special handling from load balancers. Unlike HTTP requests that can be routed to any backend, WebSocket connections are persistent—once established, all frames must travel to the same server. This requires sticky sessions or consistent hashing.
Nginx Configuration
# /etc/nginx/conf.d/websocket.conf
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
upstream websocket_backend {
# ip_hash provides sticky sessions based on client IP
# Alternative: hash $request_uri consistent (for token-based routing)
ip_hash;
server 10.0.0.1:3001 max_fails=3 fail_timeout=30s;
server 10.0.0.2:3002 max_fails=3 fail_timeout=30s;
server 10.0.0.3:3003 max_fails=3 fail_timeout=30s;
server 10.0.0.4:3004 max_fails=3 fail_timeout=30s;
# Backup servers for failover
server 10.0.0.5:3005 backup;
}
server {
listen 443 ssl http2;
server_name ws.example.com;
ssl_certificate /etc/ssl/certs/ws.example.com.pem;
ssl_certificate_key /etc/ssl/private/ws.example.com.key;
location /ws {
proxy_pass http://websocket_backend;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Keep connections alive for 24 hours
proxy_read_timeout 86400s;
proxy_send_timeout 86400s;
# Buffer settings for WebSocket
proxy_buffering off;
proxy_cache off;
# Limit request size
client_max_body_size 1m;
}
# Health check endpoint
location /health {
proxy_pass http://websocket_backend;
proxy_http_version 1.1;
}
}HAProxy Configuration
# /etc/haproxy/haproxy.cfg
frontend ws_frontend
bind *:443 ssl crt /etc/ssl/certs/ws.pem
mode http
# Route WebSocket traffic
acl is_websocket hdr(Upgrade) -i websocket
use_backend ws_backend if is_websocket
default_backend http_backend
backend ws_backend
mode http
balance source # Sticky sessions by source IP
# Health checks
option httpchk GET /health
http-check expect status 200
# Timeout settings for long-lived connections
timeout server 86400s
timeout tunnel 86400s
timeout connect 5s
# Retry on connection failure
retry-on conn-failure
retries 3
server ws1 10.0.0.1:3001 check inter 10s fall 3 rise 2
server ws2 10.0.0.2:3002 check inter 10s fall 3 rise 2
server ws3 10.0.0.3:3003 check inter 10s fall 3 rise 2
server ws4 10.0.0.4:3004 check inter 10s fall 3 rise 2
backend http_backend
mode http
balance roundrobin
server web1 10.0.0.10:80 check
server web2 10.0.0.11:80 check
Token-Based Routing
When users reconnect after a disconnection, routing them to the same server can help restore session state quickly.
// Client-side: include a routing token in the WebSocket URL
class ReconnectingWebSocket {
constructor(baseUrl, options = {}) {
this.baseUrl = baseUrl;
this.routingToken = options.routingToken || null;
this.maxRetries = options.maxRetries || 5;
this.retryCount = 0;
}
connect() {
let url = this.baseUrl;
if (this.routingToken) {
url += `?routing=${this.routingToken}`;
}
this.ws = new WebSocket(url);
this.ws.onopen = () => {
this.retryCount = 0;
};
this.ws.onmessage = (event) => {
const data = JSON.parse(event.data);
// Server sends routing token on connection
if (data.type === 'routing_token') {
this.routingToken = data.token;
localStorage.setItem('ws_routing_token', data.token);
}
};
this.ws.onclose = () => {
this.reconnect();
};
}
reconnect() {
if (this.retryCount >= this.maxRetries) {
// Clear routing token and try any server
this.routingToken = null;
this.retryCount = 0;
}
const delay = Math.min(1000 * Math.pow(2, this.retryCount), 30000);
this.retryCount++;
setTimeout(() => this.connect(), delay);
}
}Graceful Shutdown and Connection Draining
When scaling down or deploying new code, you can't simply kill the server process—doing so drops all connections instantly, causing data loss and poor user experience. Graceful shutdown involves stopping new connections, notifying existing clients, and waiting for them to disconnect cleanly.
class GracefulShutdown {
constructor(httpServer, wss, options = {}) {
this.httpServer = httpServer;
this.wss = wss;
this.drainTimeout = options.drainTimeout || 30000; // 30 seconds
this.notifyMessage = options.notifyMessage || {
type: 'server_shutdown',
reason: 'Deployment in progress',
reconnectDelay: 5000,
};
this.isShuttingDown = false;
}
init() {
process.on('SIGTERM', () => this.shutdown('SIGTERM'));
process.on('SIGINT', () => this.shutdown('SIGINT'));
}
async shutdown(signal) {
if (this.isShuttingDown) return;
this.isShuttingDown = true;
console.log(`Received ${signal}, starting graceful shutdown...`);
// Step 1: Stop accepting new connections
this.httpServer.close(() => {
console.log('HTTP server closed, no new connections accepted');
});
// Step 2: Notify existing clients
const clients = Array.from(this.wss.clients);
console.log(`Notifying ${clients.length} connected clients...`);
for (const ws of clients) {
if (ws.readyState === WebSocket.OPEN) {
ws.send(JSON.stringify(this.notifyMessage));
}
}
// Step 3: Wait for clients to disconnect
const startTime = Date.now();
await new Promise((resolve) => {
const checkInterval = setInterval(() => {
const remaining = this.wss.clients.size;
const elapsed = Date.now() - startTime;
if (remaining === 0) {
clearInterval(checkInterval);
console.log('All clients disconnected gracefully');
resolve();
} else if (elapsed > this.drainTimeout) {
clearInterval(checkInterval);
console.log(`Drain timeout reached, force closing ${remaining} connections`);
for (const ws of this.wss.clients) {
ws.terminate();
}
resolve();
} else {
console.log(`Waiting for ${remaining} clients to disconnect (${Math.ceil((this.drainTimeout - elapsed) / 1000)}s remaining)`);
}
}, 2000);
});
// Step 4: Clean up resources
this.wss.close();
console.log('Graceful shutdown complete');
process.exit(0);
}
// Middleware to reject new connections during shutdown
middleware() {
return (req, res, next) => {
if (this.isShuttingDown) {
res.writeHead(503, { 'Retry-After': '30' });
res.end('Server is shutting down');
return;
}
next();
};
}
}Production Monitoring and Metrics
Monitoring WebSocket servers requires tracking connection-level metrics that don't exist in traditional HTTP applications. You need to know how many connections are active, how many messages are flowing, and whether connections are healthy.
const { Counter, Gauge, Histogram, register } = require('prom-client');
class WebSocketMetrics {
constructor() {
this.activeConnections = new Gauge({
name: 'ws_active_connections',
help: 'Number of active WebSocket connections',
labelNames: ['server_id'],
});
this.connectionTotal = new Counter({
name: 'ws_connections_total',
help: 'Total WebSocket connections opened',
labelNames: ['server_id'],
});
this.disconnectionTotal = new Counter({
name: 'ws_disconnections_total',
help: 'Total WebSocket disconnections',
labelNames: ['server_id', 'reason'],
});
this.messagesReceived = new Counter({
name: 'ws_messages_received_total',
help: 'Total WebSocket messages received',
labelNames: ['server_id', 'type'],
});
this.messagesSent = new Counter({
name: 'ws_messages_sent_total',
help: 'Total WebSocket messages sent',
labelNames: ['server_id'],
});
this.messageSize = new Histogram({
name: 'ws_message_size_bytes',
help: 'WebSocket message size in bytes',
labelNames: ['direction'],
buckets: [64, 256, 1024, 4096, 16384, 65536],
});
this.connectionDuration = new Histogram({
name: 'ws_connection_duration_seconds',
help: 'WebSocket connection duration',
buckets: [1, 10, 60, 300, 900, 3600, 86400],
});
this.heartbeatLatency = new Histogram({
name: 'ws_heartbeat_latency_ms',
help: 'Heartbeat round-trip latency',
buckets: [1, 5, 10, 25, 50, 100, 250, 500, 1000],
});
}
onConnect(serverId) {
this.activeConnections.labels(serverId).inc();
this.connectionTotal.labels(serverId).inc();
}
onDisconnect(serverId, reason, durationMs) {
this.activeConnections.labels(serverId).dec();
this.disconnectionTotal.labels(serverId, reason).inc();
this.connectionDuration.observe(durationMs / 1000);
}
onMessageReceived(serverId, type, sizeBytes) {
this.messagesReceived.labels(serverId, type).inc();
this.messageSize.labels('received').observe(sizeBytes);
}
onMessageSent(serverId, sizeBytes) {
this.messagesSent.labels(serverId).inc();
this.messageSize.labels('sent').observe(sizeBytes);
}
async getMetrics() {
return register.metrics();
}
}Dashboard Alerts
# prometheus-alerts.yml
groups:
- name: websocket_alerts
rules:
- alert: HighConnectionCount
expr: ws_active_connections > 40000
for: 5m
labels:
severity: warning
annotations:
summary: "High WebSocket connection count on {{ $labels.server_id }}"
description: "Server {{ $labels.server_id }} has {{ $value }} active connections"
- alert: ConnectionChurn
expr: rate(ws_disconnections_total[5m]) > 100
for: 2m
labels:
severity: warning
annotations:
summary: "High connection churn rate"
description: "{{ $value }} disconnections per second on {{ $labels.server_id }}"
- alert: HighHeartbeatLatency
expr: histogram_quantile(0.95, ws_heartbeat_latency_ms) > 500
for: 5m
labels:
severity: critical
annotations:
summary: "High heartbeat latency"
description: "95th percentile heartbeat latency is {{ $value }}ms"Common Pitfalls and Solutions
| Pitfall | Impact | Solution |
|---|---|---|
| No heartbeat detection | Zombie connections accumulate, memory leak | Implement ping/pong with timeout |
| Broadcasting to all connections | O(n) CPU and network per message | Use room-based pub/sub with selective broadcast |
| No connection limits per IP | Single client exhausts server resources | Implement per-IP connection limits |
| Missing sticky sessions | Connections randomly drop after handshake | Configure ip_hash or hash-based routing in load balancer |
| No graceful shutdown | Data loss during deployments | Implement connection draining with client notification |
| Storing state only in memory | Lost on server restart or crash | Persist critical state to Redis or database |
| No rate limiting | Abuse and resource exhaustion | Implement per-connection message rate limits |
| Ignoring backpressure | Memory growth when clients are slow | Check ws.bufferedAmount before sending, implement flow control |
| Single Redis instance | Single point of failure for pub/sub | Use Redis Sentinel or Cluster for high availability |
| No connection authentication | Unauthorized access to real-time channels | Authenticate during WebSocket handshake upgrade |
Best Practices Summary
- Use pub/sub for multi-server deployments: Redis Pub/Sub or NATS for cross-server message distribution
- Implement heartbeats: Detect and clean up dead connections every 30 seconds
- Set per-IP connection limits: Prevent resource exhaustion from individual clients
- Use binary protocols: MessagePack or Protocol Buffers reduce payload size by 30-50% compared to JSON
- Monitor connection counts and message rates: Set up alerts for anomalies
- Implement graceful shutdown: Drain connections before killing server processes
- Use connection routing tokens: Route reconnecting clients to the same server for session continuity
- Test with realistic load: Use tools like
k6orartilleryto test with 100K+ concurrent connections - Persist critical state externally: Use Redis or a database for state that must survive server restarts
- Implement backpressure handling: Check
bufferedAmountand implement flow control for slow clients
Conclusion
Scaling WebSocket servers to millions of connections is a multi-layered challenge that spans connection management, pub/sub architecture, load balancing, graceful lifecycle management, and production monitoring. A single server handles 10K-50K connections well, but reaching millions requires horizontal scaling with a message broker like Redis Pub/Sub to distribute messages across server instances. Load balancers must be configured with sticky sessions to ensure WebSocket frames reach the correct server. Heartbeat mechanisms prevent resource leaks from zombie connections. Graceful shutdown with connection draining ensures zero-downtime deployments. And comprehensive monitoring with tools like Prometheus and Grafana gives you visibility into the health and performance of your real-time infrastructure.
The architecture described here has been battle-tested in production systems handling millions of concurrent connections for applications like live sports scores, financial trading platforms, multiplayer games, and large-scale chat systems. Start with a solid single-server implementation, add pub/sub when you need horizontal scaling, and invest in monitoring and graceful shutdown as you move toward production readiness. The key is building incrementally—each layer adds complexity, so only add the scaling layers you actually need.
Key takeaways:
- Single servers handle 10K-50K connections before needing horizontal scaling
- Redis Pub/Sub enables message distribution across server instances for rooms and direct messages
- Sticky sessions are required for WebSocket load balancing—use
ip_hashor consistent hashing - Heartbeats detect and clean up dead connections, preventing memory leaks
- Graceful shutdown with connection draining ensures zero-downtime deployments
- Prometheus metrics provide visibility into connection counts, message rates, and latency
- Per-connection rate limiting prevents individual clients from exhausting server resources