Database Sharding: Scaling Horizontally

Introduction

Every growing application eventually hits the same wall: the database can't keep up. Queries slow down, write throughput plateaus, and adding more CPU or memory to a single server (vertical scaling) becomes prohibitively expensive. This is the moment when database sharding moves from an academic concept to a production necessity.

Database sharding—the practice of splitting a single database into multiple smaller databases distributed across servers—is one of the most powerful and most dangerous scaling techniques available. Done right, it enables applications to handle millions of users and petabytes of data. Done wrong, it creates a distributed nightmare of data inconsistencies, cross-shard queries, and operational complexity that can take months to unwind.

This guide covers everything you need to know about database sharding: when to shard, how to choose shard keys, the different sharding strategies, and the operational challenges you'll face in production.

When to Shard (and When Not To)

Signs You Need Sharding

Sharding is a last resort, not a first optimization. Before considering sharding, exhaust these alternatives:

Query optimization: Add indexes, rewrite slow queries, use EXPLAIN to understand query plans
Read replicas: Offload read traffic to replica databases
Caching: Add Redis or Memcached for frequently accessed data
Vertical scaling: Upgrade to a more powerful server (this can get you surprisingly far)
Table partitioning: Split large tables within a single database

If you've done all of the above and still can't meet performance requirements, sharding may be necessary. Common triggers include:

Write throughput: A single primary can't handle the write volume (typically >10,000 writes/second)
Data size: The dataset exceeds what a single server can store efficiently (>1-10TB)
Query latency: Indexes no longer fit in memory, causing slow queries
Availability requirements: A single database is a single point of failure

The Cost of Sharding

Sharding introduces significant complexity:

Cross-shard queries: Queries that span multiple shards are slow and complex
Distributed transactions: ACID transactions across shards require 2PC or Saga patterns
Schema changes: ALTER TABLE must be applied to every shard
Rebalancing: Adding or removing shards requires data migration
Operational overhead: Monitoring, backups, and troubleshooting become more complex
Application complexity: The application must be shard-aware or use a routing layer

Sharding Strategies

Hash-Based Sharding

Hash-based sharding distributes data evenly across shards using a hash function:

shard_id = hash(shard_key) % number_of_shards
 
Example:
User ID: 12345
Hash: hash(12345) = 7823
Shards: 4
Shard assignment: 7823 % 4 = 3 → Shard 3

Advantages:

Even data distribution (no hotspots)
Simple to implement
Works well with auto-incrementing IDs

Disadvantages:

Range queries require scanning all shards
Adding/removing shards requires rehashing most data
No locality—related data may end up on different shards

Implementation with consistent hashing:

import hashlib
import bisect
 
class ConsistentHash:
    def __init__(self, nodes, replicas=150):
        self.replicas = replicas
        self.ring = {}
        self.sorted_keys = []
        for node in nodes:
            self.add_node(node)
    
    def add_node(self, node):
        for i in range(self.replicas):
            key = self._hash(f"{node}:{i}")
            self.ring[key] = node
            bisect.insort(self.sorted_keys, key)
    
    def get_node(self, data_key):
        if not self.ring:
            return None
        key = self._hash(data_key)
        idx = bisect.bisect(self.sorted_keys, key) % len(self.sorted_keys)
        return self.ring[self.sorted_keys[idx]]
    
    def _hash(self, key):
        return int(hashlib.md5(key.encode()).hexdigest(), 16)

Consistent hashing minimizes data movement when adding or removing shards—only approximately 1/n of keys need to move.

Range-Based Sharding

Range-based sharding groups data by value ranges:

Shard 0: users with ID 0-999,999
Shard 1: users with ID 1,000,000-1,999,999
Shard 2: users with ID 2,000,000-2,999,999
Shard 3: users with ID 3,000,000+

Advantages:

Range queries are efficient (hit only one shard)
Easy to understand and debug
Natural ordering within shards

Disadvantages:

Prone to hotspots (new users all go to the last shard)
Uneven data distribution over time
Manual rebalancing required

Better approach: time-based sharding for time-series data:

Shard 2023-Q1: data from Jan-Mar 2023
Shard 2023-Q2: data from Apr-Jun 2023
Shard 2023-Q3: data from Jul-Sep 2023

Directory-Based Sharding

A lookup table maps each key to its shard:

Directory Table:
┌─────────────┬──────────┐
│ User ID     │ Shard    │
├─────────────┼──────────┤
│ 12345       │ Shard 2  │
│ 67890       │ Shard 0  │
│ 11111       │ Shard 3  │
└─────────────┴──────────┘

Advantages:

Maximum flexibility—data can be moved by updating the directory
Supports custom sharding logic
Easy rebalancing

Disadvantages:

Single point of failure (the directory)
Extra lookup adds latency
Directory can become a bottleneck

Geographic Sharding

Data is sharded by user location:

US users → US shard (us-east-1)
EU users → EU shard (eu-west-1)
Asia users → Asia shard (ap-southeast-1)

Advantages:

Low latency for users (data is close to them)
Natural data locality
Compliance with data residency regulations (GDPR)

Disadvantages:

Users who travel may access data from the wrong region
Cross-region queries are complex
Requires handling data migration for users who move

Choosing a Shard Key

The shard key is the most critical decision in sharding. It determines data distribution, query routing, and the ability to scale.

Good Shard Key Properties

High cardinality: The key should have many distinct values to ensure even distribution
Even distribution: Values should be uniformly distributed, not skewed
Query alignment: Most queries should include the shard key, so they hit a single shard
Immutable: The shard key should never change (changing it requires data migration)
Low frequency: The key shouldn't be so frequently accessed that it creates hotspots

Common Shard Keys

User ID: Best for user-centric applications where each user's data is independent. Most queries include WHERE user_id = ?, which routes to a single shard.

Tenant ID: Best for multi-tenant SaaS applications. Each tenant's data is isolated on a single shard.

Geographic region: Best for applications with data residency requirements or where locality matters.

Time period: Best for time-series data. New shards are created for each time period, and old shards can be archived.

Bad Shard Keys

Email address: High cardinality but poor distribution (many emails share domains) Status field: Low cardinality (active/inactive) creates massive hotspots Created date: New data all goes to the same shard (hotspot) Auto-incrementing ID without hashing: Sequential IDs create write hotspots on the last shard

Shard Routing

Application-Level Routing

The application determines which shard to query:

class ShardRouter:
    def __init__(self, shards, num_shards):
        self.shards = shards
        self.num_shards = num_shards
    
    def get_shard(self, shard_key):
        shard_id = hash(shard_key) % self.num_shards
        return self.shards[shard_id]
    
    def query(self, shard_key, sql, params):
        shard = self.get_shard(shard_key)
        return shard.execute(sql, params)

Proxy-Based Routing

A proxy layer sits between the application and databases:

Application → Proxy (Vitess, ProxySQL, Citus) → Shard 0
                                              → Shard 1
                                              → Shard 2
                                              → Shard 3

Vitess: MySQL sharding proxy used by YouTube, Slack, and Square. Handles shard routing, query rewriting, and resharding.

Citus: PostgreSQL extension that adds transparent sharding. Distributes tables across nodes and routes queries automatically.

ProxySQL: MySQL proxy that can route queries based on rules. Supports read/write splitting and sharding.

Cross-Shard Operations

Cross-Shard Queries

Queries that span multiple shards are expensive:

-- This must query ALL shards and merge results
SELECT * FROM orders WHERE total > 1000 ORDER BY created_at DESC LIMIT 10;
 
-- Scatters-gather pattern:
-- 1. Send query to each shard
-- 2. Each shard returns top 10 results
-- 3. Proxy merges and returns global top 10

Strategies for handling cross-shard queries:

Denormalization: Copy data to avoid cross-shard joins
Materialized views: Pre-compute cross-shard aggregations
Search index: Use Elasticsearch for cross-shard full-text search
Application-level joins: Query each shard separately and join in application code

Distributed Transactions

ACID transactions across shards require coordination:

Two-Phase Commit (2PC):

Phase 1 (Prepare):
  Coordinator → Shard A: "Can you commit transaction T?"
  Coordinator → Shard B: "Can you commit transaction T?"
  Shard A: "Yes" (locks resources)
  Shard B: "Yes" (locks resources)
 
Phase 2 (Commit):
  Coordinator → Shard A: "Commit T"
  Coordinator → Shard B: "Commit T"

2PC is slow and creates a single point of failure. Prefer the Saga pattern for cross-shard operations:

Saga Pattern:
1. Create order on Shard A
2. Reserve inventory on Shard B
3. Process payment on Shard C
If step 3 fails:
  3a. Cancel inventory reservation on Shard B
  2a. Cancel order on Shard A

Resharding

The Resharding Challenge

Adding or removing shards requires moving data. This is the most operationally complex aspect of sharding.

Double-write migration:

Phase 1: Write to both old and new shards
Phase 2: Backfill historical data to new shards
Phase 3: Verify data consistency
Phase 4: Switch reads to new shards
Phase 5: Stop writes to old shards
Phase 6: Decommission old shards

Online schema change tools:

gh-ost: GitHub's online schema change tool for MySQL
pt-online-schema-change: Percona's tool for zero-downtime migrations
pgroll: Zero-downtime schema migrations for PostgreSQL

Sharding with Vitess

Vitess handles resharding automatically:

# Split shard 0 into shards 0 and 1
vtctlclient Reshard --source_shards "0" --target_shards "0,1" create
 
# Vitess handles:
# 1. Data migration
# 2. Query routing during migration
# 3. Consistency verification
# 4. Cutover

Sharding in Practice

PostgreSQL with Citus

Citus adds transparent sharding to PostgreSQL:

-- Create a distributed table
SELECT create_distributed_table('orders', 'customer_id');
 
-- Citus automatically:
-- 1. Distributes data across worker nodes
-- 2. Routes queries to the correct shard
-- 3. Handles cross-shard aggregations
 
-- This query is automatically routed
SELECT * FROM orders WHERE customer_id = 12345;
 
-- This query is automatically distributed
SELECT COUNT(*) FROM orders GROUP BY status;

MongoDB Sharding

MongoDB has built-in sharding support:

// Enable sharding on a database
sh.enableSharding("mydb");
 
// Shard a collection using hashed user_id
sh.shardCollection("mydb.users", { user_id: "hashed" });
 
// Shard with compound shard key (tenant + created date)
sh.shardCollection("mydb.events", { tenant_id: 1, created_at: 1 });

Vitess for MySQL

Vitess is the production standard for MySQL sharding:

# VSchema: defines how tables are sharded
{
  "sharded": true,
  "vindexes": {
    "hash_user_id": {
      "type": "hash"
    }
  },
  "tables": {
    "users": {
      "column_vindexes": [
        { "column": "user_id", "name": "hash_user_id" }
      ]
    },
    "orders": {
      "column_vindexes": [
        { "column": "user_id", "name": "hash_user_id" }
      ]
    }
  }
}

Monitoring and Operations

Key Metrics to Monitor

Shard skew: Uneven data distribution across shards
Query latency per shard: Hot shards will have higher latency
Cross-shard query ratio: High ratios indicate poor shard key choice
Replication lag: Each shard's replicas should be in sync
Connection pool utilization: Each shard has its own connection pool

Operational Playbook

Adding a shard:

Provision new database server
Configure replication
Run resharding migration
Verify data consistency
Update routing configuration
Monitor for issues

Handling a shard failure:

Promote replica to primary
Update routing to point to new primary
Investigate root cause
Provision replacement replica
Verify data integrity

Sharding Anti-Patterns

Sharding Too Early

The most common mistake is sharding before it's necessary. Sharding adds operational complexity that compounds over time. A single PostgreSQL instance with proper indexing, connection pooling, and read replicas can handle 100,000+ queries per second and terabytes of data. Many startups that shard early spend more engineering time managing shards than building product features.

Before sharding, verify that you've exhausted vertical scaling. A modern server with 128 cores, 512GB RAM, and NVMe storage can handle most workloads. Cloud providers offer instances with up to 448 vCPUs and 12TB RAM. If your workload fits on a single machine, keep it there.

Choosing the Wrong Shard Key

The shard key is nearly impossible to change after the fact. A poor shard key creates cascading problems:

Hotspot: If 80% of your traffic hits 20% of your shards, you've created hotspots. This happens when the shard key is correlated with access patterns—for example, sharding by country when 60% of users are in the US.

Cross-shard joins: If your most common query joins two tables that are sharded on different keys, every join becomes a cross-shard operation. Design your schema so related data lives on the same shard.

Rebalancing nightmare: If the shard key has uneven distribution, adding new shards doesn't help—the data stays skewed. Hash-based keys avoid this, but range-based keys require careful planning.

Ignoring Cross-Shard Queries

Some teams shard their database and then discover that 30% of their queries are cross-shard. This happens when the data model doesn't align with the shard key. Before sharding, analyze your query patterns:

-- Find the most common WHERE clauses in your slow query log
SELECT query, COUNT(*) as frequency, AVG(duration) as avg_duration
FROM slow_query_log
WHERE timestamp > NOW() - INTERVAL 7 DAY
GROUP BY query
ORDER BY frequency DESC
LIMIT 20;

If most queries include the same column (like user_id or tenant_id), that's your shard key. If queries are all over the place, sharding will be painful.

Real-World Sharding Case Studies

Case Study 1: Instagram's Sharding Strategy

Instagram shards by user ID using PostgreSQL. Each user's data (posts, comments, likes) lives on a single shard determined by hash(user_id) % num_shards. This works because the most common query pattern is "get user's feed," which only needs data from one shard.

When Instagram needed to reshard (double the number of shards), they used a multi-phase migration:

Created new shard databases
Ran a background job to copy data from old shards to new shards based on the new hash
Used double-write to keep both old and new shards in sync during migration
Verified consistency with checksums
Switched reads to new shards
Stopped writes to old shards

The entire migration took 3 months and was performed without downtime.

Case Study 2: Slack's Message Sharding

Slack sharded their message storage by workspace ID. Each workspace's messages live on a single shard. This works because users only access messages within their workspace—there are no cross-workspace queries.

The challenge was handling large workspaces. A single workspace with 500,000 users generates billions of messages. Slack addressed this with sub-sharding: large workspaces are split into time-based sub-shards, with recent messages on fast storage and older messages on cheaper storage.

Case Study 3: Pinterest's Migration from Vitess

Pinterest initially sharded MySQL using Vitess, but found that the operational complexity of managing Vitess clusters was significant. They migrated to a custom sharding layer built on top of bare MySQL instances. The custom layer handled routing, resharding, and failover with less overhead than Vitess.

The lesson: off-the-shelf sharding solutions are a good starting point, but at massive scale, custom solutions may be more efficient. Most teams should start with Vitess or Citus and only consider custom solutions when they've exhausted the existing tools.

Advanced Sharding Patterns

Hierarchical Sharding

For multi-tenant SaaS applications with tenants of vastly different sizes, a single-level shard key may not work. A small tenant with 100 users doesn't need a dedicated shard, but a large enterprise tenant with 100,000 users might need multiple shards.

Hierarchical sharding uses two levels:

Level 1: Tenant ID → determines shard group
Level 2: User ID → determines shard within group
 
Small tenants: tenant_id % 8 → Shard 0-7
Large tenants: (tenant_id, user_id) → dedicated shards

This pattern lets you give large tenants dedicated resources while small tenants share shards efficiently.

Shard-Per-Service Architecture

In a microservices architecture, different services may shard differently. The user service shards by user ID, the order service shards by tenant ID, and the analytics service shards by time period. Each service owns its sharding strategy independently.

This works as long as services don't need to join data across shards. When services need to correlate data (e.g., "show me orders for this user"), use asynchronous replication to a denormalized read store rather than cross-service joins.

Geo-Sharding with Data Residency

GDPR, CCPA, and other regulations require data to be stored in specific geographic regions. Geo-sharding routes user data to shards in their region:

def get_shard(user):
    if user.region == 'EU':
        return eu_shards[hash(user.id) % len(eu_shards)]
    elif user.region == 'US':
        return us_shards[hash(user.id) % len(us_shards)]
    else:
        return ap_shards[hash(user.id) % len(ap_shards)]

The challenge is handling users who move between regions. This requires data migration between shard groups, which must be coordinated carefully to avoid data loss or inconsistency.

Monitoring Sharded Databases

Key Metrics

Sharded databases require monitoring at both the shard level and the cluster level:

Shard-level metrics:

Query latency per shard (identify hotspots)
Replication lag per shard (catch replication issues early)
Connection pool utilization per shard
Disk usage and growth rate per shard

Cluster-level metrics:

Cross-shard query ratio (should be <5% for a well-designed shard key)
Data distribution skew (no shard should have >2x the average data)
Total query throughput across all shards
End-to-end latency for cross-shard operations

Alerting thresholds:

Shard query latency >100ms → investigate hotspot
Replication lag >5 seconds → alert oncall
Cross-shard query ratio >10% → review shard key
Data skew >2x → plan rebalancing

Tools for Sharded Databases

Vitess: Built-in monitoring dashboard, query analysis, and resharding tools
Citus: pg_citus extension for monitoring distributed tables
ProxySQL: Query routing and monitoring for MySQL sharding
Orchestrator: MySQL replication management and failover
pganalyze: PostgreSQL query analysis across distributed nodes

Testing Sharded Databases

Load Testing at Scale

Sharding problems only manifest at scale. A query that runs in 10ms on a single database with 1 million rows might take 5 seconds when scattered across 16 shards with 100 million rows each. Load testing must use production-scale data volumes and realistic query patterns.

Tools like pgbench for PostgreSQL and sysbench for MySQL can simulate concurrent workloads across multiple shards. The key is to test not just individual shard performance but also cross-shard query patterns, connection pool exhaustion, and failover scenarios.

Consistency Verification

After resharding or data migration, verify consistency between source and destination shards. This involves counting rows, comparing checksums on critical columns, and running application-level consistency checks. Tools like pt-table-checksum for MySQL and pg_comparator for PostgreSQL automate this process.

The verification step is often skipped in the rush to complete a migration, but inconsistencies discovered weeks later are much harder to diagnose and fix. Build automated consistency checks into your resharding pipeline and run them continuously during the migration window.

Chaos Testing

Test shard failures under load. Kill a shard primary during peak traffic and verify that failover happens cleanly without data loss or prolonged downtime. This testing reveals assumptions in your application code about database availability that would otherwise surface during a real incident.

Netflix's Chaos Monkey philosophy applies to sharded databases: the best time to discover that your application doesn't handle shard failover gracefully is during a controlled test, not during a production incident at 3 AM.

Conclusion

Database sharding is a powerful scaling technique, but it comes with significant complexity. Before sharding, exhaust simpler alternatives: query optimization, read replicas, caching, and vertical scaling. When you do shard, choose your shard key carefully—it's the single most important decision and very difficult to change later.

Key takeaways:

Shard key is everything — Choose poorly and you'll fight hotspots, cross-shard queries, and resharding pain forever
Hash-based sharding for even distribution — Use consistent hashing to minimize data movement during resharding
Range-based sharding for time-series — Natural fit for data that's accessed by time ranges
Avoid cross-shard operations — Design your schema so most queries hit a single shard
Use established tools — Vitess, Citus, and MongoDB's built-in sharding handle the hard parts for you
Monitor shard skew — Uneven data distribution defeats the purpose of sharding
Plan for resharding — You will need to add shards eventually. Design for it from the start
Test with production-scale data — Sharding problems only appear at scale. Load test with realistic data volumes

Sharding is a journey, not a destination. Start with a clear understanding of your data access patterns, choose a shard key that aligns with your most common queries, and build the operational tooling you'll need to manage a distributed database. The complexity is real, but so is the scale it enables.

Minh Vo

Slaying code & making it lit fr fr 🔥 tagline

Database Sharding: Scaling Horizontally

Introduction

When to Shard (and When Not To)

Signs You Need Sharding

The Cost of Sharding

Sharding Strategies

Hash-Based Sharding

Range-Based Sharding

Directory-Based Sharding

Geographic Sharding

Choosing a Shard Key

Good Shard Key Properties

Common Shard Keys

Bad Shard Keys

Shard Routing

Application-Level Routing

Proxy-Based Routing

Cross-Shard Operations

Cross-Shard Queries

Distributed Transactions

Resharding

The Resharding Challenge

Sharding with Vitess

Sharding in Practice

PostgreSQL with Citus

MongoDB Sharding

Vitess for MySQL

Monitoring and Operations

Key Metrics to Monitor

Operational Playbook

Sharding Anti-Patterns

Sharding Too Early

Choosing the Wrong Shard Key

Ignoring Cross-Shard Queries

Real-World Sharding Case Studies

Case Study 1: Instagram's Sharding Strategy

Case Study 2: Slack's Message Sharding

Case Study 3: Pinterest's Migration from Vitess

Advanced Sharding Patterns

Hierarchical Sharding

Shard-Per-Service Architecture

Geo-Sharding with Data Residency

Monitoring Sharded Databases

Key Metrics

Tools for Sharded Databases

Testing Sharded Databases

Load Testing at Scale

Consistency Verification

Chaos Testing

Conclusion