PostgreSQL Performance Tuning: Indexing Strategies

Introduction

Indexes are the single most impactful tool for improving query performance in PostgreSQL. A well-placed index can turn a query that scans millions of rows into one that returns results in microseconds. But indexes are not free—they consume disk space, slow down writes, and require maintenance. The art of indexing is knowing when an index helps, when it hurts, and which type of index is right for each workload.

PostgreSQL supports a rich set of index types beyond the standard B-tree: hash indexes for equality lookups, GIN (Generalized Inverted Index) for full-text search and array containment, GiST (Generalized Search Tree) for geometric and range data, BRIN (Block Range INdex) for naturally ordered large tables, and SP-GiST for partitioned search spaces. Each type has specific strengths, and choosing the wrong one can leave performance on the table or even make things worse.

This guide covers everything you need to master PostgreSQL indexing. We will start with B-tree fundamentals—the index type that handles the majority of workloads—then explore each specialized index type with practical examples. You will learn how to read and interpret EXPLAIN ANALYZE output to verify your indexes are being used, how to identify missing indexes, how to detect and remove unused indexes, and how to tune index parameters for production workloads.

Understanding Indexes: Core Concepts

How PostgreSQL Uses Indexes

When PostgreSQL receives a query, the query planner evaluates multiple execution plans and chooses the one with the lowest estimated cost. The planner considers sequential scans (reading the entire table), index scans (using an index to find specific rows), and bitmap scans (using an index to build a bitmap of matching pages, then fetching those pages).

An index scan is most beneficial when the query returns a small fraction of the table's rows. The planner uses statistics about the data distribution (stored in pg_statistics) to estimate how many rows will match. If the estimated selectivity is low (few rows match), the planner will prefer an index scan. If the selectivity is high (most rows match), a sequential scan is usually cheaper because it avoids the overhead of jumping between index and table pages.

This means that an index on a low-cardinality column (like a boolean or a status field with only 3 values) is often less useful than an index on a high-cardinality column (like an email address or a timestamp). However, even low-cardinality indexes can be valuable when combined with other columns in a composite index.

Index Types Overview

PostgreSQL offers six built-in index types, each optimized for different access patterns:

Index Type	Best For	Typical Use Case
B-tree	Equality, range, sorting	Most OLTP queries; `<`, `>`, `=`, `BETWEEN`, `ORDER BY`
Hash	Equality only	High-cardinality equality lookups; faster than B-tree for `=` only
GIN	Containment, full-text search	`@>`, `<@`, `?`, `?
GiST	Geometric, range, nearest-neighbor	PostGIS spatial queries; range types; `KNN` with `<->` operator
BRIN	Range queries on naturally ordered data	Time-series data; auto-incrementing IDs on very large tables
SP-GiST	Partitioned search spaces	Phone numbers (tries); IP addresses (quadtrees); radishes

B-Tree Deep Dive

B-tree is the default index type in PostgreSQL and handles the vast majority of indexing needs. A B-tree maintains a balanced tree structure where leaf nodes contain index entries sorted by the indexed column(s). This structure supports efficient equality lookups (O(log n)), range scans, and sorting.

-- Create a basic B-tree index (type is implicit)
CREATE INDEX idx_orders_customer_id ON orders (customer_id);
 
-- Explicitly specify B-tree type
CREATE INDEX idx_orders_customer_id ON orders USING btree (customer_id);
 
-- Multi-column (composite) B-tree index
CREATE INDEX idx_orders_customer_status ON orders (customer_id, status);

A composite B-tree index can satisfy queries that filter on the leftmost column(s) of the index. An index on (customer_id, status) can be used by queries filtering on customer_id alone, or on customer_id AND status, but NOT by queries filtering on status alone. This is the "leftmost prefix" rule.

Architecture and Design Patterns

Covering Indexes

A covering index includes additional columns beyond those used for filtering, so the index itself contains all the data needed to satisfy the query. This allows PostgreSQL to answer the query using only the index, without accessing the table at all—an operation called an index-only scan:

-- Query that needs customer_id for filtering and total_cents for the result
SELECT customer_id, total_cents FROM orders WHERE status = 'completed';
 
-- Covering index that includes total_cents
CREATE INDEX idx_orders_status_covering ON orders (status) INCLUDE (total_cents);

The INCLUDE clause (PostgreSQL 11+) adds non-key columns to the leaf level of the B-tree. These columns are not used for ordering or searching but are stored in the index for retrieval. This is more space-efficient than adding the columns to the key portion of the index.

Partial Indexes

A partial index indexes only a subset of rows that match a predicate. This is powerful when you frequently query a small, well-defined subset of a large table:

-- Only index active orders (assume most orders are completed/cancelled)
CREATE INDEX idx_active_orders ON orders (customer_id)
WHERE status = 'pending' OR status = 'processing';
 
-- Only index recent events
CREATE INDEX idx_recent_events ON events (user_id, event_type)
WHERE created_at > '2024-01-01';

Partial indexes are smaller than full indexes (because they index fewer rows), faster to maintain, and can be more effective for queries that match the partial index predicate. The planner will use a partial index only when the query's WHERE clause is compatible with the index predicate.

Expression Indexes

An expression index indexes the result of an expression or function rather than a raw column value:

-- Index on lowercased email for case-insensitive lookups
CREATE INDEX idx_users_lower_email ON users (LOWER(email));
 
-- Index on date extraction
CREATE INDEX idx_orders_created_date ON orders (DATE(created_at));
 
-- Index on JSONB path
CREATE INDEX idx_users_city ON users ((profile -> 'address' ->> 'city'));

Expression indexes are essential when your queries use functions or transformations on columns. Without an expression index, PostgreSQL must evaluate the function for every row in the table, preventing index usage.

Multi-Column Indexes and Column Order

The order of columns in a composite index matters significantly. PostgreSQL can use a composite index for:

Queries filtering on all indexed columns
Queries filtering on a leftmost prefix of the indexed columns
Queries that use the index for sorting after filtering on leftmost columns

-- Index on (customer_id, created_at)
CREATE INDEX idx_orders_cust_date ON orders (customer_id, created_at);
 
-- This query CAN use the index:
SELECT * FROM orders WHERE customer_id = 42 ORDER BY created_at DESC;
 
-- This query CANNOT use the index (filters on created_at only):
SELECT * FROM orders WHERE created_at > '2024-01-01';

Rule of thumb: place the most selective (highest cardinality) column first, unless your query patterns consistently filter on a specific column first.

Step-by-Step Implementation

Identifying Missing Indexes

The first step in performance tuning is finding queries that would benefit from indexes. Use pg_stat_statements to identify slow queries:

-- Enable pg_stat_statements (requires restart or shared_preload_libraries)
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
 
-- Find queries with high total execution time
SELECT
    query,
    calls,
    round(total_exec_time::numeric, 2) AS total_ms,
    round(mean_exec_time::numeric, 2) AS avg_ms,
    rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

Then use EXPLAIN ANALYZE on the slow queries to see if they are doing sequential scans:

EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 42 AND status = 'pending';
 
-- Output might show:
-- Seq Scan on orders  (cost=0.00..1234.56 rows=10 width=48) (actual time=0.02..45.67 rows=8 loops=1)
--   Filter: ((customer_id = 42) AND (status = 'pending'))
--   Rows Removed by Filter: 99992
-- Planning Time: 0.100 ms
-- Execution Time: 45.800 ms

The Seq Scan indicates no index is being used. The Rows Removed by Filter shows how many rows were scanned unnecessarily. This is a clear candidate for an index.

Creating the Right Index

Based on the query pattern above, create a composite index:

CREATE INDEX idx_orders_customer_status ON orders (customer_id, status);
 
-- Verify the index is used
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 42 AND status = 'pending';
 
-- Output should now show:
-- Index Scan using idx_orders_customer_status on orders  (cost=0.42..8.44 rows=10 width=48) (actual time=0.01..0.03 rows=8 loops=1)
--   Index Cond: ((customer_id = 42) AND (status = 'pending'))
-- Planning Time: 0.150 ms
-- Execution Time: 0.050 ms

The execution time dropped from 45ms to 0.05ms—a 900x improvement.

Finding Unused Indexes

Indexes that are never used by the query planner waste disk space and slow down writes. Find them with:

SELECT
    schemaname,
    relname AS table_name,
    indexrelname AS index_name,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
    idx_scan AS times_used
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelid NOT IN (
    SELECT conindid FROM pg_constraint WHERE contype IN ('p', 'u')
)
ORDER BY pg_relation_size(indexrelid) DESC;

This query shows indexes that have never been scanned, excluding primary keys and unique constraints (which serve a data integrity purpose beyond query performance). Before dropping an unused index, verify it is not needed by periodic batch jobs or reports that may not have run since the last statistics reset.

Creating Indexes Without Blocking Writes

In production, CREATE INDEX acquires an exclusive lock on the table, blocking all writes for the duration of index creation. Use CONCURRENTLY to avoid this:

-- This blocks writes until the index is built
CREATE INDEX idx_orders_customer_id ON orders (customer_id);
 
-- This builds the index without blocking writes
CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders (customer_id);

CREATE INDEX CONCURRENTLY takes longer and requires two passes over the table, but it allows reads and writes to continue during index creation. It cannot be run inside a transaction block. If the concurrent build fails, you may be left with an invalid index—check with:

SELECT indexrelname, indisvalid, indisready
FROM pg_stat_user_indexes
WHERE indexrelname = 'idx_orders_customer_id';

Real-World Use Cases

Use Case 1: Optimizing E-Commerce Order Lookups

An e-commerce platform queries orders by customer, status, and date range:

-- Common query pattern
SELECT * FROM orders
WHERE customer_id = 42
AND status IN ('pending', 'processing')
AND created_at > '2024-01-01'
ORDER BY created_at DESC
LIMIT 20;
 
-- Optimal index
CREATE INDEX idx_orders_customer_status_date ON orders (customer_id, status, created_at DESC);

The descending sort on created_at in the index matches the ORDER BY clause, eliminating the need for a sort step in the query plan.

Use Case 2: Full-Text Search with GIN Index

For searching articles by content:

-- Add a tsvector column
ALTER TABLE articles ADD COLUMN search_vector tsvector;
 
-- Populate it
UPDATE articles SET search_vector = to_tsvector('english', title || ' ' || body);
 
-- Create GIN index
CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);
 
-- Query using the index
SELECT title, ts_rank(search_vector, query) AS rank
FROM articles, plainto_tsquery('english', 'postgresql indexing') query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 10;

Use Case 3: Time-Series Data with BRIN Index

For a table with billions of rows where data is naturally ordered by time:

-- BRIN index is tiny compared to B-tree for ordered data
CREATE INDEX idx_events_time_brin ON events USING BRIN (created_at)
WITH (pages_per_range = 32);
 
-- Query
SELECT * FROM events WHERE created_at BETWEEN '2024-01-01' AND '2024-01-02';

A BRIN index for a billion-row table might be only a few MB, while an equivalent B-tree index could be tens of GB. The tradeoff is that BRIN indexes are less precise—they store min/max values per block range, so the planner must still scan some blocks that do not contain matching rows.

Best Practices for Production

Index for your query patterns, not your data model: Create indexes based on the queries your application actually runs, not based on what columns exist. Use pg_stat_statements to identify the most impactful queries.
Use composite indexes instead of multiple single-column indexes: A single composite index on (a, b) is more effective than separate indexes on a and b for queries filtering on both columns.
Consider covering indexes for hot queries: If a query only needs a few columns, use INCLUDE to create a covering index that supports index-only scans.
Use partial indexes for skewed data distributions: If you frequently query a small subset of rows (e.g., active users, pending orders), a partial index is smaller and faster.
Use CONCURRENTLY for production index creation: Never create indexes on large production tables without CONCURRENTLY unless you can tolerate write downtime.
Monitor and remove unused indexes: Unused indexes waste disk space and slow down every INSERT, UPDATE, and DELETE. Check pg_stat_user_indexes quarterly.
VACUUM regularly: Index bloat from dead tuples degrades performance. Ensure autovacuum is properly tuned, and consider manual VACUUM on heavily updated tables.
Use EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) for detailed plans: The BUFFERS option shows how many pages were read from cache vs disk, helping you diagnose I/O bottlenecks.

Common Pitfalls and Solutions

Pitfall	Impact	Solution
Creating too many indexes	Slow writes; disk bloat; confusing planner	Audit indexes regularly; drop unused ones
Not using CONCURRENTLY	Write blocking during index creation	Always use `CREATE INDEX CONCURRENTLY` in production
Wrong column order in composite index	Index not used by queries	Put most selective or most-queried column first
Index on low-cardinality column alone	Index rarely chosen by planner	Use partial index or combine with other columns
Forgetting expression index for function queries	Full table scan on every function-based query	Create expression index matching the function
Not analyzing after bulk load	Planner statistics stale; bad plans	Run `ANALYZE` after large data loads
Relying on pg_stat_statements without extension	Cannot identify slow queries	Enable `pg_stat_statements` in `shared_preload_libraries`

Performance Optimization

Index Maintenance

-- Check index bloat
SELECT
    indexrelname,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
    idx_scan AS scans
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC
LIMIT 20;
 
-- Reindex a bloated index (blocking)
REINDEX INDEX idx_orders_customer_id;
 
-- Reindex concurrently (PG 12+)
REINDEX INDEX CONCURRENTLY idx_orders_customer_id;
 
-- Check index usage statistics
SELECT
    relname AS table_name,
    indexrelname AS index_name,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;

Tuning Planner Statistics

-- Increase statistics target for a specific column
ALTER TABLE orders ALTER COLUMN customer_id SET STATISTICS 1000;
 
-- Analyze to collect new statistics
ANALYZE orders;
 
-- Check the statistics
SELECT
    attname,
    n_distinct,
    most_common_vals,
    most_common_freqs
FROM pg_stats
WHERE tablename = 'orders' AND attname = 'customer_id';

Work Memory for Sorting

When PostgreSQL uses an index that does not match the sort order of the query, it must sort the results in memory. The work_mem setting controls how much memory is available for this:

-- Increase work_mem for complex sort operations
SET work_mem = '256MB';
 
-- Check if sorts are spilling to disk
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE customer_id = 42 ORDER BY total_cents DESC;
-- Look for "Sort Method: external merge" which indicates disk spill

Comparison with Alternatives

Feature	B-tree	Hash	GIN	GiST	BRIN	SP-GiST
Equality (`=`)	Excellent	Excellent	Good	Good	Good	Good
Range (`<`, `>`, `BETWEEN`)	Excellent	No support	No support	Good	Good	No support
Sort (`ORDER BY`)	Yes	No	No	No	No	No
Full-text search	No	No	Excellent	Fair	No	No
Array containment	No	No	Excellent	No	No	No
Geometric/spatial	No	No	No	Excellent	No	Fair
Size on disk	Medium	Small	Large	Medium	Tiny	Medium
Write overhead	Low	Low	High	Medium	Very Low	Low

Advanced Patterns

Partial Indexes with Expression

Combine partial and expression indexes for maximum precision:

-- Index only recent active users with lowercased email lookup
CREATE INDEX idx_active_recent_users ON users (LOWER(email))
WHERE status = 'active' AND created_at > '2023-01-01';

Unique Indexes for Data Integrity

Unique indexes serve double duty: they enforce uniqueness and provide fast lookups:

-- Unique index on lowercased email (case-insensitive uniqueness)
CREATE UNIQUE INDEX idx_users_email_unique ON users (LOWER(email));

Index-Only Scans Verification

-- Force an index-only scan to verify the covering index works
SET enable_seqscan = off;
EXPLAIN ANALYZE SELECT customer_id, status FROM orders WHERE customer_id = 42;
SET enable_seqscan = on;
 
-- Look for "Index Only Scan" in the output
-- Check "Heap Fetches" — if high, run VACUUM to update visibility map

Testing Strategies

Benchmarking Index Impact

-- Disable the index to measure baseline performance
DROP INDEX CONCURRENTLY idx_orders_customer_status;
 
-- Run benchmark query
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE customer_id = 42 AND status = 'pending';
 
-- Recreate the index
CREATE INDEX CONCURRENTLY idx_orders_customer_status ON orders (customer_id, status);
 
-- Run benchmark query again and compare
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE customer_id = 42 AND status = 'pending';

Load Testing with pgbench

# Initialize pgbench
pgbench -i -s 50 mydb
 
# Run read-heavy workload without custom indexes
pgbench -c 10 -T 300 -S mydb
 
# Create custom indexes, then re-run
pgbench -c 10 -T 300 -S mydb

Future Outlook

PostgreSQL continues to improve index capabilities. Recent improvements include B-tree deduplication (PG 13, reduces index size for duplicate values), improved BRIN parallel build (PG 14), and faster index-only scans through visibility map improvements. Future releases are expected to add better support for multi-dimensional indexes and improved statistics for complex data distributions.

The trend toward storing more diverse data types (JSONB, vectors for AI embeddings, geospatial) in PostgreSQL makes specialized index types increasingly important. The pgvector extension, for example, uses IVFFlat and HNSW indexes for vector similarity search, bringing PostgreSQL into the AI/ML infrastructure stack.

Conclusion

Effective indexing is the foundation of PostgreSQL performance tuning. The right index can improve query performance by orders of magnitude, while the wrong index (or no index) can bring a production system to its knees. The key takeaways are:

Default to B-tree indexes for most workloads—they handle equality, range, and sorting
Use composite indexes with the most selective column first
Use covering indexes (INCLUDE) for index-only scans on hot queries
Use partial indexes when you frequently query a subset of rows
Use expression indexes when queries use functions on columns
Use GIN indexes for full-text search, JSONB containment, and array operations
Use BRIN indexes for naturally ordered data on very large tables
Always create indexes concurrently in production
Monitor and remove unused indexes quarterly
Use EXPLAIN (ANALYZE, BUFFERS) to verify index usage and diagnose performance

Start by enabling pg_stat_statements, identifying your slowest queries, and creating targeted indexes. Use EXPLAIN ANALYZE to verify each index is being used. The investment in proper indexing pays dividends across every aspect of your application's performance.

Minh Vo

Slaying code & making it lit fr fr 🔥 tagline