PostgreSQL Logical Replication: Setup and Use Cases

Introduction

In the modern data landscape, keeping databases in sync across servers, regions, and even different database versions is a critical operational requirement. PostgreSQL's logical replication has matured significantly since its introduction in version 10, and by 2024 it is a battle-tested feature powering production workloads at companies ranging from startups to enterprises. Unlike physical replication, which copies raw data blocks and requires identical server configurations, logical replication works at the semantic level—it replicates the actual data changes (INSERTs, UPDATEs, DELETEs) as logical operations that can be applied selectively and flexibly.

This maturity means you can now use logical replication for sophisticated production use cases that were previously impossible or required third-party tools: migrating between PostgreSQL major versions with near-zero downtime, replicating specific tables to analytics servers, distributing data across geographic regions for low-latency access, and even implementing multi-writer architectures with proper conflict handling.

In this comprehensive guide, you will learn the complete setup process from configuration to production deployment. We will cover publications and subscriptions in depth, explore advanced configuration options like row filtering and column lists (PostgreSQL 15+), examine conflict resolution strategies for multi-writer setups, and walk through real-world use cases with production-ready code. By the end, you will have the knowledge to implement logical replication confidently in your own infrastructure.

Understanding Logical Replication: Core Concepts

How Logical Decoding Works

At the heart of logical replication is logical decoding—the process of converting PostgreSQL's internal WAL (Write-Ahead Log) records into a logical representation of data changes. The WAL is a sequential log of all changes made to the database. In physical replication, these WAL records are shipped as raw binary blocks. In logical replication, a logical decoding output plugin interprets the WAL records and produces a stream of logical changes.

The default output plugin is pgoutput, which is built into PostgreSQL and produces changes in a format optimized for logical replication subscriptions. Each change is described as a tuple operation: an INSERT with the new row values, an UPDATE with both old and new values (depending on replica identity configuration), or a DELETE with the old row values.

The decoding process happens on the publisher side. A WAL sender process reads the WAL, passes records through the output plugin, and sends the resulting logical changes to the subscriber over a standard PostgreSQL connection. The subscriber's apply worker receives the changes and applies them using regular SQL INSERT, UPDATE, and DELETE statements.

Replica Identity

The replica identity of a table determines how UPDATE and DELETE operations are identified on the subscriber. By default, the replica identity is the table's primary key. This means when an UPDATE occurs on the publisher, the subscriber identifies the row to update by matching the primary key values.

You can change the replica identity to use different columns or even the entire row:

-- Use a unique index as replica identity
ALTER TABLE orders REPLICA IDENTITY USING INDEX orders_unique_external_id;
 
-- Use all columns (full row) as replica identity
ALTER TABLE orders REPLICA IDENTITY FULL;
 
-- Use the default (primary key)
ALTER TABLE orders REPLICA IDENTITY DEFAULT;

Using FULL replica identity sends all column values for UPDATE and DELETE operations, which increases bandwidth but is necessary when the table has no primary key or unique index that can serve as a reliable identifier.

Transactional Consistency

Logical replication preserves transactional consistency on the subscriber. Changes from a single transaction on the publisher are applied as a single transaction on the subscriber. This means you will never see partial transactions on the subscriber. However, the order in which transactions are applied may differ from the order they committed on the publisher if they did not conflict (i.e., did not modify the same rows).

Architecture and Design Patterns

Read Scaling Architecture

The most common use case for logical replication is read scaling—offloading read queries from the primary to one or more replicas:

-- On primary: publish only tables that need read scaling
CREATE PUBLICATION read_scale_pub FOR TABLE orders, products, customers;
 
-- On read replica 1
CREATE SUBSCRIPTION replica_1_sub
    CONNECTION 'host=primary dbname=mydb user=replicator password=secret'
    PUBLICATION read_scale_pub;
 
-- On read replica 2
CREATE SUBSCRIPTION replica_2_sub
    CONNECTION 'host=primary dbname=mydb user=replicator password=secret'
    PUBLICATION read_scale_pub;

This pattern is superior to physical replication for read scaling when you only need to replicate a subset of tables. It reduces network bandwidth, storage requirements on the replica, and replication lag for tables that change frequently.

Multi-Region Data Distribution

For applications with global users, you can distribute data to regional servers to reduce read latency:

-- On global primary, create region-specific publications
CREATE PUBLICATION us_data FOR TABLE orders WHERE (region = 'US'), customers WHERE (region = 'US');
CREATE PUBLICATION eu_data FOR TABLE orders WHERE (region = 'EU'), customers WHERE (region = 'EU');
 
-- On US regional server
CREATE SUBSCRIPTION us_sub
    CONNECTION 'host=global-primary dbname=mydb user=replicator'
    PUBLICATION us_data;
 
-- On EU regional server
CREATE SUBSCRIPTION eu_sub
    CONNECTION 'host=global-primary dbname=mydb user=replicator'
    PUBLICATION eu_data;

Data Warehouse Feeding

Logical replication can feed a data warehouse with near-real-time data:

-- On the OLTP primary
CREATE PUBLICATION warehouse_pub FOR TABLE orders, line_items, payments;
 
-- On the data warehouse subscriber
CREATE SUBSCRIPTION warehouse_sub
    CONNECTION 'host=oltp-primary dbname=mydb user=replicator'
    PUBLICATION warehouse_pub;
 
-- Add analytical columns on the subscriber side only
ALTER TABLE orders ADD COLUMN computed_margin NUMERIC;
-- Triggers or scheduled jobs can populate these without impacting the OLTP primary

Blue-Green Deployment Pattern

Logical replication enables blue-green database deployments with minimal downtime:

Set up the green (new) database as a subscriber to the blue (current) database
Wait for initial sync to complete and replication lag to reach zero
Briefly pause writes on the blue database (application-level)
Verify green is fully caught up
Switch application traffic to the green database
Green becomes the new primary; blue can be decommissioned

Step-by-Step Implementation

Server Configuration

Publisher postgresql.conf:

wal_level = logical
max_replication_slots = 10
max_wal_senders = 10
wal_keep_size = '1GB'           # Keep WAL for subscribers that fall behind
max_slot_wal_keep_size = '10GB' # Prevent disk exhaustion from inactive slots

Publisher pg_hba.conf:

# TYPE  DATABASE  USER       ADDRESS           METHOD
host    mydb      replicator 192.168.1.0/24    scram-sha-256

Subscriber postgresql.conf:

max_logical_replication_workers = 4
max_sync_workers_per_subscription = 2

Creating Publications

-- Create a publication for all tables
CREATE PUBLICATION all_tables_pub FOR ALL TABLES;
 
-- Create a publication for specific tables with specific operations
CREATE PUBLICATION orders_pub FOR TABLE orders WITH (publish = 'insert, update');
 
-- Create a publication with row filtering (PG 15+)
CREATE PUBLICATION active_orders_pub
    FOR TABLE orders WHERE (status NOT IN ('cancelled', 'archived'));
 
-- Create a publication with column filtering (PG 15+)
CREATE PUBLICATION minimal_orders_pub
    FOR TABLE orders (id, customer_id, status, total_cents);

Creating Subscriptions

-- Create the subscription (disabled initially for verification)
CREATE SUBSCRIPTION my_sub
    CONNECTION 'host=publisher-host dbname=mydb user=replicator password=secret'
    PUBLICATION orders_pub
    WITH (enabled = false);
 
-- Verify the subscription was created
SELECT * FROM pg_subscription;
 
-- Enable the subscription to start replication
ALTER SUBSCRIPTION my_sub ENABLE;

Managing Subscriptions

-- Disable a subscription (stops replication)
ALTER SUBSCRIPTION my_sub DISABLE;
 
-- Refresh publication table list (if tables were added to the publication)
ALTER SUBSCRIPTION my_sub REFRESH PUBLICATION;
 
-- Change the connection string (e.g., after failover)
ALTER SUBSCRIPTION my_sub CONNECTION 'host=new-publisher dbname=mydb user=replicator';
 
-- Drop a subscription (stops replication and removes the subscription)
DROP SUBSCRIPTION my_sub;

Initial Data Load

When a subscription is created, PostgreSQL automatically begins the initial data copy. For large tables, you may want to control when this happens:

-- Disable the subscription to prevent immediate sync
CREATE SUBSCRIPTION my_sub
    CONNECTION 'host=publisher dbname=mydb user=replicator'
    PUBLICATION my_pub
    WITH (enabled = false, copy_data = false);
 
-- Manually copy data using pg_dump/pg_restore, then enable
-- (This gives you control over when the bulk data transfer happens)
ALTER SUBSCRIPTION my_sub ENABLE;

Real-World Use Cases

Use Case 1: Major Version Upgrade with Minimal Downtime

Upgrading PostgreSQL from version 14 to 16 using logical replication:

-- On PG 14 (publisher)
CREATE PUBLICATION upgrade_pub FOR ALL TABLES;
 
-- On PG 16 (subscriber)
-- 1. Dump schema only from PG 14
-- pg_dump --schema-only -d mydb > schema.sql
 
-- 2. Load schema into PG 16
-- psql -d mydb < schema.sql
 
-- 3. Create subscription
CREATE SUBSCRIPTION upgrade_sub
    CONNECTION 'host=pg14-server dbname=mydb user=replicator'
    PUBLICATION upgrade_pub;
 
-- 4. Wait for sync to complete
SELECT * FROM pg_subscription_rel WHERE srsubstate != 'r';
 
-- 5. Switch application (brief downtime for connection string change)
-- 6. Drop subscription on PG 16, drop publication on PG 14

Use Case 2: Real-Time Analytics with Separate OLAP Server

-- On OLTP primary
CREATE PUBLICATION analytics_pub FOR TABLE events, user_sessions, transactions;
 
-- On OLAP subscriber
CREATE SUBSCRIPTION analytics_sub
    CONNECTION 'host=oltp-primary dbname=mydb user=replicator'
    PUBLICATION analytics_pub;
 
-- Create materialized views on the OLAP server
CREATE MATERIALIZED VIEW daily_revenue AS
SELECT
    date_trunc('day', created_at) AS day,
    SUM(total_cents) / 100.0 AS revenue,
    COUNT(*) AS order_count
FROM transactions
WHERE status = 'completed'
GROUP BY 1;
 
-- Refresh on schedule (no impact on OLTP)
REFRESH MATERIALIZED VIEW CONCURRENTLY daily_revenue;

Use Case 3: Microservice Data Distribution

Different microservices can subscribe to different publications containing only the tables they need:

-- On the central database
CREATE PUBLICATION orders_service FOR TABLE orders, line_items, payments;
CREATE PUBLICATION inventory_service FOR TABLE products, stock_levels, warehouses;
CREATE PUBLICATION user_service FOR TABLE users, sessions, preferences;
 
-- On the orders microservice database
CREATE SUBSCRIPTION orders_sub
    CONNECTION 'host=central-db dbname=mydb user=replicator'
    PUBLICATION orders_service;
 
-- On the inventory microservice database
CREATE SUBSCRIPTION inventory_sub
    CONNECTION 'host=central-db dbname=mydb user=replicator'
    PUBLICATION inventory_service;

Best Practices for Production

Always set max_slot_wal_keep_size: This prevents a single inactive replication slot from filling your disk with unreleased WAL files. Set it to a value that balances safety with the expected maximum subscriber downtime.
Use SCRAM-SHA-256 authentication: Replication connections carry sensitive data. Always use SCRAM-SHA-256 (not MD5 or trust) for replication user authentication.
Monitor pg_stat_subscription actively: Check last_msg_send_time, last_msg_receipt_time, and latest_end_lsn to detect replication issues early.
Create indexes on subscriber tables independently: Subscriber tables can have additional indexes that do not exist on the publisher. This is useful for analytics workloads that need different index strategies.
Set synchronous_commit = 'remote_apply' cautiously: This ensures changes are applied on the subscriber before the publisher transaction commits, but it adds latency to every write. Use it only when strong consistency is required.
Test with pg_bench before production: Simulate your expected write load and measure replication lag under stress. This helps you right-size the subscriber and tune WAL settings.
Document your replication topology: As your replication setup grows more complex, maintain documentation of which publications exist, which subscribers consume them, and what conflict resolution strategy is in place.
Plan for sequence management: Sequences are not replicated. Either use different sequence ranges on each server, use UUIDs, or use a centralized ID generation service.

Common Pitfalls and Solutions

Pitfall	Impact	Solution
Replication slot grows unbounded	Disk space exhaustion on publisher	Set `max_slot_wal_keep_size`; monitor `pg_replication_slots`
Table without primary key	UPDATE/DELETE fail on subscriber	Add a primary key or set `REPLICA IDENTITY FULL`
DDL changes not replicated	Schema drift between publisher and subscriber	Apply DDL changes manually on both sides; use migration tools
Sequences produce duplicate IDs	Data integrity issues on merged datasets	Use different sequence ranges, UUIDs, or centralized ID generation
Subscriber writes conflict with replicated data	Errors or silent data loss	Ensure subscribers are read-only or implement conflict resolution
Large initial sync impacts publisher performance	Production slowdown during setup	Use `copy_data = false` and manual pg_dump; or schedule during low traffic
`TRUNCATE` not published by default	Data drift if TRUNCATE is used	Include `truncate` in publication: `WITH (publish = 'insert, update, delete, truncate')`

Performance Optimization

Tuning Apply Workers

-- Increase parallel apply workers for high-throughput workloads
ALTER SYSTEM SET max_logical_replication_workers = 8;
ALTER SYSTEM SET max_sync_workers_per_subscription = 4;
SELECT pg_reload_conf();

Reducing WAL Overhead

-- Use column lists to reduce the amount of data replicated (PG 15+)
CREATE PUBLICATION lean_pub FOR TABLE orders (id, status, total_cents);
 
-- Use row filtering to reduce the number of rows replicated (PG 15+)
CREATE PUBLICATION recent_pub FOR TABLE orders WHERE (created_at > '2024-01-01');

Monitoring Queries

-- Comprehensive replication monitoring
SELECT
    s.subname AS subscription_name,
    c.client_addr AS subscriber_ip,
    w.state,
    w.sent_lsn,
    w.write_lsn,
    w.flush_lsn,
    w.replay_lsn,
    pg_size_pretty(pg_wal_lsn_diff(w.sent_lsn, w.replay_lsn)) AS lag_size,
    w.reply_time AS last_reply
FROM pg_stat_subscription s
JOIN pg_stat_replication w ON s.subname = w.application_name
LEFT JOIN pg_stat_activity c ON w.pid = c.pid;
 
-- Check per-table sync status
SELECT
    s.subname,
    c.relname,
    CASE sr.srsubstate
        WHEN 'i' THEN 'init'
        WHEN 'd' THEN 'data copy'
        WHEN 's' THEN 'synchronized'
        WHEN 'r' THEN 'ready'
        WHEN 'e' THEN 'catchup'
    END AS state
FROM pg_subscription_rel sr
JOIN pg_subscription s ON sr.srsubid = s.oid
JOIN pg_class c ON sr.srrelid = c.oid;

Comparison with Alternatives

Feature	PostgreSQL Logical Replication	Debezium CDC	AWS DMS	pglogical Extension
Built-in	Yes (PG 10+)	No (Kafka Connect)	No (AWS service)	No (extension)
Selective replication	Yes	Yes	Yes	Yes
Row filtering	PG 15+	Yes	Yes	Yes
Column filtering	PG 15+	Yes	Yes	No
Cross-platform target	No (PG to PG only)	Yes (any Kafka consumer)	Yes (multiple targets)	PG only
Conflict resolution	PG 15+	Application-level	Built-in	Yes
DDL replication	No	Partial	Yes	No
Operational complexity	Low	High (Kafka ecosystem)	Medium	Low
Cost	Free (built-in)	Free + infrastructure	Per-DU pricing	Free (extension)

Advanced Patterns

Bidirectional Replication with Conflict Detection

For active-active setups where both sites can accept writes:

-- On server A
CREATE PUBLICATION a_to_b FOR TABLE shared_data;
CREATE SUBSCRIPTION a_from_b
    CONNECTION 'host=server-b dbname=mydb user=replicator'
    PUBLICATION b_to_a
    WITH (origin = 'none');
 
-- On server B
CREATE PUBLICATION b_to_a FOR TABLE shared_data;
CREATE SUBSCRIPTION b_from_a
    CONNECTION 'host=server-a dbname=mydb user=replicator'
    PUBLICATION a_to_b
    WITH (origin = 'none');

Add conflict detection using triggers:

CREATE OR REPLACE FUNCTION detect_conflict()
RETURNS TRIGGER AS $$
BEGIN
    IF NEW.updated_at < OLD.updated_at THEN
        INSERT INTO conflict_log (table_name, row_id, local_data, remote_data)
        VALUES (TG_TABLE_NAME, OLD.id, row_to_json(OLD), row_to_json(NEW));
        -- Choose resolution strategy: last-write-wins or manual
        RETURN NEW; -- or RETURN OLD to reject the remote change
    END IF;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

Testing Strategies

Verifying Replication Completeness

-- Compare row counts between publisher and subscriber
-- (Run on publisher)
SELECT 'orders' AS table_name, COUNT(*) AS pub_count FROM orders;
 
-- (Run on subscriber)
SELECT 'orders' AS table_name, COUNT(*) AS sub_count FROM orders;
 
-- Compare checksums for critical tables
-- (Run on both)
SELECT md5(string_agg(row_hash, '')) AS table_hash
FROM (SELECT md5(row(t.*)::text) AS row_hash FROM orders t ORDER BY id) sub;

Load Testing Replication

-- Generate test data on publisher
INSERT INTO orders (customer_id, total_cents, status)
SELECT
    (random() * 1000)::int,
    (random() * 10000)::int,
    CASE WHEN random() > 0.5 THEN 'pending' ELSE 'completed' END
FROM generate_series(1, 100000);
 
-- Monitor lag on publisher during load
SELECT pg_size_pretty(pg_wal_lsn_diff(sent_lsn, replay_lsn)) AS lag
FROM pg_stat_replication;

Future Outlook

PostgreSQL continues to enhance logical replication with each release. Key upcoming improvements include:

Built-in logical replication for partitioned tables (improving in PG 17+)
Enhanced conflict resolution for multi-writer topologies
Improved DDL replication to reduce manual schema management
Better integration with external CDC tools via improved output plugins

The logical replication story in PostgreSQL is moving toward becoming a complete solution for data distribution, not just a database-to-database copy mechanism. As the feature matures, expect it to reduce the need for external CDC tools in many use cases.

Conclusion

PostgreSQL logical replication is a mature, production-ready feature for real-time data distribution. Whether you need read replicas, zero-downtime migrations, multi-region data distribution, or data warehouse feeding, logical replication provides a built-in, reliable solution. The key takeaways are:

Configure wal_level = logical and manage replication slots proactively
Use row filtering and column lists (PG 15+) to reduce replication overhead
Always set max_slot_wal_keep_size to prevent disk exhaustion
Test initial sync timing and replication lag under production-like load
Document your replication topology and conflict resolution strategy
Monitor pg_stat_subscription and pg_stat_replication continuously
Plan for sequence management across replicated databases
Use the blue-green deployment pattern for major version upgrades

Start with a simple publisher-subscriber setup in development, validate your failover procedures in staging, and deploy with confidence. The PostgreSQL documentation on logical replication, combined with the monitoring queries in this guide, gives you everything you need to succeed in production.

Minh Vo

Slaying code & making it lit fr fr 🔥 tagline