Introduction
Distributed SQL databases have become the backbone of modern cloud-native applications that demand both the familiarity of relational databases and the scalability of distributed systems. As applications grow beyond a single server, the need for databases that can scale horizontally while maintaining ACID guarantees becomes critical. Three prominent players in this space—CockroachDB, TiDB, and Google Cloud Spanner—each take distinct architectural approaches to solve the distributed SQL problem.
CockroachDB was built from the ground up to provide serializable isolation across a distributed system using the Raft consensus protocol. TiDB, inspired by Google's Spanner and F1 papers, separates the SQL processing layer from the storage layer (TiKV) and uses the Raft protocol for data replication. Google Cloud Spanner, the pioneer of globally distributed strongly consistent databases, uses a custom consensus protocol called TrueTime that relies on atomic clocks and GPS receivers to achieve external consistency across data centers worldwide.
Understanding the differences between these systems is essential for making the right architectural decision. Each database makes different trade-offs between consistency, availability, latency, cost, and operational complexity. This guide provides a comprehensive technical comparison to help you choose the right distributed SQL database for your use case.
Understanding Distributed SQL: Core Concepts
Distributed SQL databases aim to provide the ACID guarantees of traditional relational databases while scaling horizontally across multiple nodes and geographic regions. The key challenge is maintaining consistency across distributed nodes without sacrificing availability or introducing unacceptable latency.
The CAP theorem states that a distributed system can only guarantee two of three properties: Consistency, Availability, and Partition tolerance. Since network partitions are inevitable in distributed systems, the real choice is between consistency and availability during a partition. CockroachDB and Spanner choose consistency (CP systems), while TiDB can be configured for either consistency or availability depending on the use case.
The consensus protocol is the heart of any distributed database. All three systems use variants of Paxos or Raft to ensure that a majority of replicas agree on each write before committing it. This majority quorum approach provides fault tolerance—the system continues operating correctly as long as a majority of replicas are available. The trade-off is that writes must wait for a majority acknowledgment, which introduces latency proportional to the network round-trip time between replicas.
Multi-version concurrency control (MVCC) is another fundamental concept. All three databases use MVCC to allow concurrent reads and writes without blocking. Each write creates a new version of the data, and readers see a consistent snapshot as of a specific timestamp. This approach eliminates read-write conflicts but requires garbage collection of old versions to prevent unbounded storage growth.
Architecture and Design Patterns
CockroachDB Architecture
CockroachDB implements a custom distributed SQL engine that translates SQL queries into key-value operations on its distributed storage layer. The architecture consists of three main layers:
The SQL layer handles query parsing, optimization, and execution planning. It supports most PostgreSQL wire protocol features and can be accessed using standard PostgreSQL drivers. The transaction layer implements serializable isolation using a timestamp-based protocol with write intents. The storage layer uses Pebble (a RocksDB-inspired engine) to persist data as sorted key-value pairs, automatically sharded into ranges of approximately 512 MB.
// CockroachDB multi-region setup
import { Pool } from 'pg'
const pool = new Pool({
connectionString: process.env.COCKROACHDB_URL,
ssl: { rejectUnauthorized: false }
})
// Configure multi-region database
async function setupMultiRegion() {
// Add regions to the database
await pool.query(`ALTER DATABASE main ADD REGION "us-east1"`)
await pool.query(`ALTER DATABASE main ADD REGION "eu-west1"`)
await pool.query(`ALTER DATABASE main ADD REGION "ap-southeast1"`)
// Set survival goals
await pool.query(`ALTER DATABASE main SURIVE REGION FAILURE`)
// Create region-aware table
await pool.query(`
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email STRING NOT NULL,
region crdb_internal_region NOT NULL DEFAULT gateway_region()
) LOCALITY REGIONAL BY ROW
`)
}TiDB Architecture
TiDB follows a layered architecture inspired by Google's Spanner and F1 papers. The TiDB server is the SQL layer that parses queries, optimizes execution plans, and coordinates with the storage layer. TiKV is the distributed transactional key-value storage engine that stores data in regions (similar to CockroachDB's ranges). PD (Placement Driver) manages cluster metadata, performs load balancing, and handles region scheduling.
TiDB also includes TiFlash, a columnar storage engine for real-time analytics (HTAP). TiFlash replicates data from TiKV asynchronously and uses a vectorized execution engine for fast analytical queries. This hybrid transactional-analytical capability is a key differentiator.
// TiDB connection (MySQL-compatible)
import mysql from 'mysql2/promise'
const pool = mysql.createPool({
host: process.env.TIDB_HOST,
user: process.env.TIDB_USER,
password: process.env.TIDB_PASSWORD,
database: process.env.TIDB_DATABASE,
ssl: { rejectUnauthorized: true }
})
// TiDB supports optimistic and pessimistic transactions
async function transferFunds(fromId: string, toId: string, amount: number) {
const conn = await pool.getConnection()
try {
await conn.beginTransaction()
const [from] = await conn.execute(
'SELECT balance FROM accounts WHERE id = ? FOR UPDATE', [fromId]
)
if (from[0].balance < amount) {
throw new Error('Insufficient funds')
}
await conn.execute(
'UPDATE accounts SET balance = balance - ? WHERE id = ?', [amount, fromId]
)
await conn.execute(
'UPDATE accounts SET balance = balance + ? WHERE id = ?', [amount, toId]
)
await conn.commit()
} catch (err) {
await conn.rollback()
throw err
} finally {
conn.release()
}
}Google Cloud Spanner Architecture
Spanner is Google's globally distributed, strongly consistent database that uses a custom consensus protocol called TrueTime. TrueTime uses atomic clocks and GPS receivers in Google's data centers to provide a tight bound on clock uncertainty (typically less than 7ms). This allows Spanner to assign globally meaningful timestamps to transactions and achieve external consistency—a stronger guarantee than serializable isolation.
Spanner organizes data into splits (similar to ranges in CockroachDB), each managed by a set of replicas using the Paxos consensus protocol. The TrueTime API allows Spanner to wait out clock uncertainty before committing transactions, ensuring that transactions are ordered consistently across all replicas worldwide.
// Spanner connection using Google Cloud client library
import { Spanner } from '@google-cloud/spanner'
const spanner = new Spanner({ projectId: process.env.GCP_PROJECT_ID })
const instance = spanner.instance('my-instance')
const database = instance.database('my-database')
// Read-write transaction with external consistency
async function createOrder(userId: string, items: OrderItem[]) {
await database.runTransactionAsync(async (tx) => {
const [user] = await tx.read('users', {
keys: [userId],
columns: ['id', 'name', 'credit_limit']
})
const totalCost = items.reduce((sum, item) => sum + item.price * item.quantity, 0)
if (user[0].credit_limit < totalCost) {
throw new Error('Credit limit exceeded')
}
tx.insert('orders', {
id: Spanner.float(Date.now()),
user_id: userId,
total: totalCost,
created_at: Spanner.COMMIT_TIMESTAMP
})
await tx.commit()
})
}Step-by-Step Implementation
Setting Up CockroachDB for Multi-Region
// cockroach-multiregion.ts
import { Pool } from 'pg'
const regionalPool = new Pool({
connectionString: process.env.COCKROACHDB_REGIONAL_URL,
ssl: { rejectUnauthorized: false }
})
async function optimizeForRegion() {
// Use follower reads for non-critical reads
await regionalPool.query(`SET default_transaction_use_follower_reads = on`)
// Use stale reads for analytics queries
const result = await regionalPool.query(`
SELECT count(*) FROM orders WHERE created_at > now() - interval '1 day'
AS OF SYSTEM TIME follower_read_timestamp()
`)
return result.rows[0]
}Setting Up TiDB with TiFlash for HTAP
// tidb-htap.ts
import mysql from 'mysql2/promise'
const pool = mysql.createPool({
host: process.env.TIDB_HOST,
user: process.env.TIDB_USER,
password: process.env.TIDB_PASSWORD,
database: process.env.TIDB_DATABASE
})
async function setupHTAP() {
// Enable TiFlash replica for analytics
await pool.execute('ALTER TABLE orders SET TIFLASH REPLICA 1')
// Use TiFlash for analytical queries
const [rows] = await pool.execute(`
SELECT /*+ READ_FROM_STORAGE(TIFLASH[orders]) */
DATE(created_at) as order_date,
COUNT(*) as order_count,
SUM(total) as revenue
FROM orders
WHERE created_at >= '2024-01-01'
GROUP BY DATE(created_at)
ORDER BY order_date
`)
return rows
}Setting Up Cloud Spanner
// spanner-setup.ts
import { Spanner } from '@google-cloud/spanner'
const spanner = new Spanner({ projectId: process.env.GCP_PROJECT_ID })
const instance = spanner.instance('my-instance')
const database = instance.database('my-database')
// Create interleaved tables for data locality
async function createSchema() {
const [operation] = await database.updateSchema(`
CREATE TABLE Users (
UserId STRING(36) NOT NULL,
Email STRING(255) NOT NULL,
CreatedAt TIMESTAMP NOT NULL OPTIONS (allow_commit_timestamp = true)
) PRIMARY KEY (UserId)
CREATE TABLE Orders (
UserId STRING(36) NOT NULL,
OrderId STRING(36) NOT NULL,
Total FLOAT64 NOT NULL,
CreatedAt TIMESTAMP NOT NULL OPTIONS (allow_commit_timestamp = true)
) PRIMARY KEY (UserId, OrderId),
INTERLEAVE IN PARENT Users ON DELETE CASCADE
`)
await operation.promise()
}Real-World Use Cases
Global Financial Services
A global payment processing platform needs strong consistency guarantees across all regions to prevent double-spending and ensure regulatory compliance. Google Cloud Spanner's external consistency guarantees make it the ideal choice, as transactions are ordered consistently worldwide regardless of which region initiates them.
E-Commerce with Regional Optimization
An e-commerce platform serving customers in the US, Europe, and Asia needs low-latency reads and writes for each region while maintaining a single source of truth. CockroachDB's regional-by-row locality keeps user data in their region for fast access while providing global consistency for cross-region operations like inventory management.
Hybrid Transactional-Analytical Workload
A fintech company needs to process real-time transactions while simultaneously running complex analytical queries for risk assessment. TiDB with TiFlash allows transactional workloads on TiKV while analytical queries run on the columnar TiFlash engine, both reading from the same data without ETL pipelines.
Multi-Tenant SaaS Platform
A B2B SaaS platform with strict data isolation requirements needs tenant-level data partitioning with strong isolation guarantees. CockroachDB's zone configurations allow per-tenant replication factors and geographic placement, while Spanner's interleaved tables ensure tenant data locality.
Best Practices for Production
-
Minimize cross-region transactions: Design your schema and access patterns to keep related data in the same region. Cross-region transactions have latency proportional to the distance between regions.
-
Use appropriate isolation levels: CockroachDB defaults to serializable, but you can use weaker isolation levels for read-heavy workloads where slight staleness is acceptable. TiDB supports both optimistic and pessimistic concurrency control.
-
Design for shard-awareness: Understand how your data is distributed across shards. Design your primary keys and indexes to avoid hotspots. Use UUIDs or snowflake IDs instead of auto-increment IDs to distribute writes evenly.
-
Monitor region health: Set up monitoring for each region's health, replication lag, and query latency. Use platform-specific dashboards and integrate with your existing observability stack.
-
Test failover scenarios: Regularly test region failover to ensure your application handles failover gracefully. Verify that your retry logic works correctly when a region becomes unavailable.
-
Optimize for your consistency requirements: Not all queries need strong consistency. Use follower reads (CockroachDB), stale reads (TiDB), or bounded staleness (Spanner) for read-heavy workloads that can tolerate slight staleness.
-
Plan capacity carefully: Distributed databases have different scaling characteristics. CockroachDB scales by adding nodes, TiDB scales SQL and storage independently, and Spanner scales automatically but with cost implications.
-
Use connection pooling: All three databases benefit from connection pooling, especially in serverless or containerized environments where connections are frequently created and destroyed.
Common Pitfalls and Solutions
| Pitfall | Impact | Solution |
|---|---|---|
| Hot key ranges from sequential IDs | Write bottleneck on single shard | Use UUIDs or snowflake IDs to distribute writes |
| Cross-region transactions without need | High latency, increased cost | Use locality-aware routing, keep related data co-located |
| Ignoring clock skew in Spanner | Inconsistent reads | Rely on TrueTime, don't use application-level timestamps |
| Over-provisioning TiFlash replicas | High storage cost | Only enable TiFlash for tables with analytical queries |
| Not using follower reads for analytics | Unnecessary load on leaseholder replicas | Use follower reads or stale reads for non-critical queries |
| Ignoring transaction retry logic | Application crashes on conflicts | Implement exponential backoff with jitter |
| Using SELECT * in distributed queries | Excessive data transfer | Select only needed columns, use covering indexes |
| Not monitoring region failover | Silent degradation | Set up alerts for region health and replication status |
Performance Optimization
Performance optimization in distributed SQL databases requires understanding the cost of distributed operations. The most impactful optimizations reduce cross-region traffic and minimize the number of round trips required for each query.
// Batch reads to reduce round trips
async function getUsersWithOrders(userIds: string[]) {
// Instead of N+1 queries, batch the reads
const users = await pool.query(
'SELECT * FROM users WHERE id = ANY($1)', [userIds]
)
const orders = await pool.query(
'SELECT * FROM orders WHERE user_id = ANY($1)', [userIds]
)
// Join in application layer
return users.rows.map(user => ({
...user,
orders: orders.rows.filter(o => o.user_id === user.id)
}))
}
// Use covering indexes to avoid table lookups
async function getActiveUsers() {
return pool.query(`
SELECT id, email, name FROM users
WHERE active = true
ORDER BY created_at DESC
LIMIT 100
`)
// With index: CREATE INDEX idx_active_users ON users (active, created_at DESC) INCLUDE (email, name)
}For CockroachDB, enabling follower reads for analytical queries can reduce read latency by reading from the nearest replica. For TiDB, using the READ_FROM_STORAGE hint to route analytical queries to TiFlash avoids impacting transactional performance. For Spanner, using interleaved tables ensures that related data is stored together, reducing the number of splits that need to be read.
Comparison with Alternatives
| Feature | CockroachDB | TiDB | Cloud Spanner |
|---|---|---|---|
| SQL Compatibility | PostgreSQL-compatible | MySQL-compatible | Custom SQL (GoogleSQL) |
| Consistency Model | Serializable by default | Snapshot isolation | External consistency (TrueTime) |
| Storage Engine | Pebble (RocksDB-inspired) | TiKV (RocksDB-based) | Custom (Colossus) |
| HTAP Support | Limited (changefeeds) | Native (TiFlash) | Limited |
| Multi-Region | Native | Manual configuration | Native global |
| Open Source | Yes (BSL) | Yes (Apache 2.0) | No (managed service) |
| Pricing | Self-hosted or managed (CockroachDB Cloud) | Self-hosted or TiDB Cloud | Per-node + storage |
| Max Cluster Size | No hard limit | No hard limit | No hard limit |
| Wire Protocol | PostgreSQL | MySQL | Custom (gRPC) |
Beyond these three, alternatives include YugabyteDB (PostgreSQL-compatible distributed database with Raft-based replication), Vitess (MySQL sharding middleware used by PlanetScale), Amazon Aurora (MySQL/PostgreSQL-compatible with quorum-based replication), and Citus (PostgreSQL extension for horizontal scaling).
Advanced Patterns
Global Secondary Indexes
Distributed databases must handle global secondary indexes carefully, as the index may be stored on a different shard than the data. This creates a trade-off between read performance (local indexes) and write amplification (global indexes).
// CockroachDB global index configuration
await pool.query(`
CREATE INDEX idx_user_email ON users (email)
GLOBAL -- Global index for cross-region lookups
`)
// TiDB global index (available in TiDB 7.0+)
await pool.execute(`
CREATE INDEX idx_user_email ON users (email) GLOBAL
`)Cross-Region Read Replicas
For read-heavy workloads that can tolerate eventual consistency, cross-region read replicas reduce read latency without the cost of full multi-region writes.
// Spanner read-only replicas
const database = instance.database('my-database', {
pool: { min: 1, max: 100 }
})
// Reads automatically route to nearest replica
const [rows] = await database.run({
sql: 'SELECT * FROM users WHERE region = @region',
params: { region: 'us-east1' }
})Partitioned Tables for Multi-Tenancy
Use table partitioning to isolate tenant data while maintaining a single logical database.
// CockroachDB partitioning by tenant region
await pool.query(`
CREATE TABLE tenant_data (
tenant_id UUID NOT NULL,
data_id UUID NOT NULL,
payload JSONB,
region crdb_internal_region NOT NULL,
PRIMARY KEY (tenant_id, data_id)
) LOCALITY REGIONAL BY ROW
`)Testing Strategies
Testing distributed databases requires simulating network partitions, node failures, and concurrent access patterns. Use the Jepsen testing methodology to verify consistency guarantees under adverse conditions.
// Consistency test with concurrent writes
describe('Distributed Consistency', () => {
test('concurrent writes to same key are serializable', async () => {
const results = await Promise.allSettled([
incrementCounter('key1', 1),
incrementCounter('key1', 1),
incrementCounter('key1', 1)
])
const finalValue = await getCounter('key1')
// With serializable isolation, the final value should be exactly 3
expect(finalValue).toBe(3)
})
})Test failover scenarios by simulating region outages and verifying that the application continues to function correctly with reduced capacity. Use chaos engineering tools like Chaos Monkey or Litmus to inject failures systematically.
Future Outlook
The distributed SQL space is converging toward a model where multi-region strong consistency is a baseline feature rather than a differentiator. CockroachDB is investing in improved cost efficiency for serverless workloads. TiDB is expanding its HTAP capabilities with improved TiFlash performance and new analytical functions. Cloud Spanner is focusing on developer experience improvements and better integration with the Google Cloud ecosystem.
The emergence of edge computing is driving demand for distributed databases that can operate at the edge with sub-millisecond latency. All three platforms are exploring edge-replicated read replicas and edge-native storage engines.
AI and vector search capabilities are becoming standard features. CockroachDB has added vector type support, TiDB is integrating vector search through TiFlash, and Spanner offers vector search through its AI integrations.
Conclusion
CockroachDB, TiDB, and Google Cloud Spanner each excel in different scenarios. CockroachDB provides the best PostgreSQL compatibility with strong multi-region capabilities and is ideal for teams that want an open-source distributed SQL database. TiDB offers MySQL compatibility with unique HTAP capabilities through TiFlash, making it perfect for hybrid transactional-analytical workloads. Cloud Spanner provides the strongest consistency guarantees with TrueTime and is the best choice for applications requiring external consistency across global deployments.
Key takeaways:
- Choose CockroachDB for PostgreSQL compatibility, open-source flexibility, and strong multi-region support with automatic data placement.
- Choose TiDB for MySQL compatibility, HTAP workloads with TiFlash, and the flexibility of open-source deployment.
- Choose Cloud Spanner for the strongest consistency guarantees, global scale without operational burden, and tight integration with Google Cloud services.
Start by evaluating your consistency requirements, latency tolerance, and operational preferences. Each platform offers free tiers or trial periods for experimentation. Refer to the official documentation for CockroachDB, TiDB, and Cloud Spanner for detailed implementation guides.