Introduction
Indexes are the single most impactful tool for improving query performance in PostgreSQL. A well-placed index can turn a query that scans millions of rows into one that returns results in microseconds. But indexes are not free—they consume disk space, slow down writes, and require maintenance. The art of indexing is knowing when an index helps, when it hurts, and which type of index is right for each workload.
PostgreSQL supports a rich set of index types beyond the standard B-tree: hash indexes for equality lookups, GIN (Generalized Inverted Index) for full-text search and array containment, GiST (Generalized Search Tree) for geometric and range data, BRIN (Block Range INdex) for naturally ordered large tables, and SP-GiST for partitioned search spaces. Each type has specific strengths, and choosing the wrong one can leave performance on the table or even make things worse.
This guide covers everything you need to master PostgreSQL indexing. We will start with B-tree fundamentals—the index type that handles the majority of workloads—then explore each specialized index type with practical examples. You will learn how to read and interpret EXPLAIN ANALYZE output to verify your indexes are being used, how to identify missing indexes, how to detect and remove unused indexes, and how to tune index parameters for production workloads.
Understanding Indexes: Core Concepts
How PostgreSQL Uses Indexes
When PostgreSQL receives a query, the query planner evaluates multiple execution plans and chooses the one with the lowest estimated cost. The planner considers sequential scans (reading the entire table), index scans (using an index to find specific rows), and bitmap scans (using an index to build a bitmap of matching pages, then fetching those pages).
An index scan is most beneficial when the query returns a small fraction of the table's rows. The planner uses statistics about the data distribution (stored in pg_statistics) to estimate how many rows will match. If the estimated selectivity is low (few rows match), the planner will prefer an index scan. If the selectivity is high (most rows match), a sequential scan is usually cheaper because it avoids the overhead of jumping between index and table pages.
This means that an index on a low-cardinality column (like a boolean or a status field with only 3 values) is often less useful than an index on a high-cardinality column (like an email address or a timestamp). However, even low-cardinality indexes can be valuable when combined with other columns in a composite index.
Index Types Overview
PostgreSQL offers six built-in index types, each optimized for different access patterns:
| Index Type | Best For | Typical Use Case |
|---|---|---|
| B-tree | Equality, range, sorting | Most OLTP queries; <, >, =, BETWEEN, ORDER BY |
| Hash | Equality only | High-cardinality equality lookups; faster than B-tree for = only |
| GIN | Containment, full-text search | @>, <@, ?, `? |
| GiST | Geometric, range, nearest-neighbor | PostGIS spatial queries; range types; KNN with <-> operator |
| BRIN | Range queries on naturally ordered data | Time-series data; auto-incrementing IDs on very large tables |
| SP-GiST | Partitioned search spaces | Phone numbers (tries); IP addresses (quadtrees); radishes |
B-Tree Deep Dive
B-tree is the default index type in PostgreSQL and handles the vast majority of indexing needs. A B-tree maintains a balanced tree structure where leaf nodes contain index entries sorted by the indexed column(s). This structure supports efficient equality lookups (O(log n)), range scans, and sorting.
-- Create a basic B-tree index (type is implicit)
CREATE INDEX idx_orders_customer_id ON orders (customer_id);
-- Explicitly specify B-tree type
CREATE INDEX idx_orders_customer_id ON orders USING btree (customer_id);
-- Multi-column (composite) B-tree index
CREATE INDEX idx_orders_customer_status ON orders (customer_id, status);A composite B-tree index can satisfy queries that filter on the leftmost column(s) of the index. An index on (customer_id, status) can be used by queries filtering on customer_id alone, or on customer_id AND status, but NOT by queries filtering on status alone. This is the "leftmost prefix" rule.
Architecture and Design Patterns
Covering Indexes
A covering index includes additional columns beyond those used for filtering, so the index itself contains all the data needed to satisfy the query. This allows PostgreSQL to answer the query using only the index, without accessing the table at all—an operation called an index-only scan:
-- Query that needs customer_id for filtering and total_cents for the result
SELECT customer_id, total_cents FROM orders WHERE status = 'completed';
-- Covering index that includes total_cents
CREATE INDEX idx_orders_status_covering ON orders (status) INCLUDE (total_cents);The INCLUDE clause (PostgreSQL 11+) adds non-key columns to the leaf level of the B-tree. These columns are not used for ordering or searching but are stored in the index for retrieval. This is more space-efficient than adding the columns to the key portion of the index.
Partial Indexes
A partial index indexes only a subset of rows that match a predicate. This is powerful when you frequently query a small, well-defined subset of a large table:
-- Only index active orders (assume most orders are completed/cancelled)
CREATE INDEX idx_active_orders ON orders (customer_id)
WHERE status = 'pending' OR status = 'processing';
-- Only index recent events
CREATE INDEX idx_recent_events ON events (user_id, event_type)
WHERE created_at > '2024-01-01';Partial indexes are smaller than full indexes (because they index fewer rows), faster to maintain, and can be more effective for queries that match the partial index predicate. The planner will use a partial index only when the query's WHERE clause is compatible with the index predicate.
Expression Indexes
An expression index indexes the result of an expression or function rather than a raw column value:
-- Index on lowercased email for case-insensitive lookups
CREATE INDEX idx_users_lower_email ON users (LOWER(email));
-- Index on date extraction
CREATE INDEX idx_orders_created_date ON orders (DATE(created_at));
-- Index on JSONB path
CREATE INDEX idx_users_city ON users ((profile -> 'address' ->> 'city'));Expression indexes are essential when your queries use functions or transformations on columns. Without an expression index, PostgreSQL must evaluate the function for every row in the table, preventing index usage.
Multi-Column Indexes and Column Order
The order of columns in a composite index matters significantly. PostgreSQL can use a composite index for:
- Queries filtering on all indexed columns
- Queries filtering on a leftmost prefix of the indexed columns
- Queries that use the index for sorting after filtering on leftmost columns
-- Index on (customer_id, created_at)
CREATE INDEX idx_orders_cust_date ON orders (customer_id, created_at);
-- This query CAN use the index:
SELECT * FROM orders WHERE customer_id = 42 ORDER BY created_at DESC;
-- This query CANNOT use the index (filters on created_at only):
SELECT * FROM orders WHERE created_at > '2024-01-01';Rule of thumb: place the most selective (highest cardinality) column first, unless your query patterns consistently filter on a specific column first.
Step-by-Step Implementation
Identifying Missing Indexes
The first step in performance tuning is finding queries that would benefit from indexes. Use pg_stat_statements to identify slow queries:
-- Enable pg_stat_statements (requires restart or shared_preload_libraries)
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Find queries with high total execution time
SELECT
query,
calls,
round(total_exec_time::numeric, 2) AS total_ms,
round(mean_exec_time::numeric, 2) AS avg_ms,
rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;Then use EXPLAIN ANALYZE on the slow queries to see if they are doing sequential scans:
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 42 AND status = 'pending';
-- Output might show:
-- Seq Scan on orders (cost=0.00..1234.56 rows=10 width=48) (actual time=0.02..45.67 rows=8 loops=1)
-- Filter: ((customer_id = 42) AND (status = 'pending'))
-- Rows Removed by Filter: 99992
-- Planning Time: 0.100 ms
-- Execution Time: 45.800 msThe Seq Scan indicates no index is being used. The Rows Removed by Filter shows how many rows were scanned unnecessarily. This is a clear candidate for an index.
Creating the Right Index
Based on the query pattern above, create a composite index:
CREATE INDEX idx_orders_customer_status ON orders (customer_id, status);
-- Verify the index is used
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 42 AND status = 'pending';
-- Output should now show:
-- Index Scan using idx_orders_customer_status on orders (cost=0.42..8.44 rows=10 width=48) (actual time=0.01..0.03 rows=8 loops=1)
-- Index Cond: ((customer_id = 42) AND (status = 'pending'))
-- Planning Time: 0.150 ms
-- Execution Time: 0.050 msThe execution time dropped from 45ms to 0.05ms—a 900x improvement.
Finding Unused Indexes
Indexes that are never used by the query planner waste disk space and slow down writes. Find them with:
SELECT
schemaname,
relname AS table_name,
indexrelname AS index_name,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
idx_scan AS times_used
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelid NOT IN (
SELECT conindid FROM pg_constraint WHERE contype IN ('p', 'u')
)
ORDER BY pg_relation_size(indexrelid) DESC;This query shows indexes that have never been scanned, excluding primary keys and unique constraints (which serve a data integrity purpose beyond query performance). Before dropping an unused index, verify it is not needed by periodic batch jobs or reports that may not have run since the last statistics reset.
Creating Indexes Without Blocking Writes
In production, CREATE INDEX acquires an exclusive lock on the table, blocking all writes for the duration of index creation. Use CONCURRENTLY to avoid this:
-- This blocks writes until the index is built
CREATE INDEX idx_orders_customer_id ON orders (customer_id);
-- This builds the index without blocking writes
CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders (customer_id);CREATE INDEX CONCURRENTLY takes longer and requires two passes over the table, but it allows reads and writes to continue during index creation. It cannot be run inside a transaction block. If the concurrent build fails, you may be left with an invalid index—check with:
SELECT indexrelname, indisvalid, indisready
FROM pg_stat_user_indexes
WHERE indexrelname = 'idx_orders_customer_id';Real-World Use Cases
Use Case 1: Optimizing E-Commerce Order Lookups
An e-commerce platform queries orders by customer, status, and date range:
-- Common query pattern
SELECT * FROM orders
WHERE customer_id = 42
AND status IN ('pending', 'processing')
AND created_at > '2024-01-01'
ORDER BY created_at DESC
LIMIT 20;
-- Optimal index
CREATE INDEX idx_orders_customer_status_date ON orders (customer_id, status, created_at DESC);The descending sort on created_at in the index matches the ORDER BY clause, eliminating the need for a sort step in the query plan.
Use Case 2: Full-Text Search with GIN Index
For searching articles by content:
-- Add a tsvector column
ALTER TABLE articles ADD COLUMN search_vector tsvector;
-- Populate it
UPDATE articles SET search_vector = to_tsvector('english', title || ' ' || body);
-- Create GIN index
CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);
-- Query using the index
SELECT title, ts_rank(search_vector, query) AS rank
FROM articles, plainto_tsquery('english', 'postgresql indexing') query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 10;Use Case 3: Time-Series Data with BRIN Index
For a table with billions of rows where data is naturally ordered by time:
-- BRIN index is tiny compared to B-tree for ordered data
CREATE INDEX idx_events_time_brin ON events USING BRIN (created_at)
WITH (pages_per_range = 32);
-- Query
SELECT * FROM events WHERE created_at BETWEEN '2024-01-01' AND '2024-01-02';A BRIN index for a billion-row table might be only a few MB, while an equivalent B-tree index could be tens of GB. The tradeoff is that BRIN indexes are less precise—they store min/max values per block range, so the planner must still scan some blocks that do not contain matching rows.
Best Practices for Production
-
Index for your query patterns, not your data model: Create indexes based on the queries your application actually runs, not based on what columns exist. Use
pg_stat_statementsto identify the most impactful queries. -
Use composite indexes instead of multiple single-column indexes: A single composite index on
(a, b)is more effective than separate indexes onaandbfor queries filtering on both columns. -
Consider covering indexes for hot queries: If a query only needs a few columns, use
INCLUDEto create a covering index that supports index-only scans. -
Use partial indexes for skewed data distributions: If you frequently query a small subset of rows (e.g., active users, pending orders), a partial index is smaller and faster.
-
Use
CONCURRENTLYfor production index creation: Never create indexes on large production tables withoutCONCURRENTLYunless you can tolerate write downtime. -
Monitor and remove unused indexes: Unused indexes waste disk space and slow down every INSERT, UPDATE, and DELETE. Check
pg_stat_user_indexesquarterly. -
VACUUM regularly: Index bloat from dead tuples degrades performance. Ensure autovacuum is properly tuned, and consider manual
VACUUMon heavily updated tables. -
Use
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)for detailed plans: TheBUFFERSoption shows how many pages were read from cache vs disk, helping you diagnose I/O bottlenecks.
Common Pitfalls and Solutions
| Pitfall | Impact | Solution |
|---|---|---|
| Creating too many indexes | Slow writes; disk bloat; confusing planner | Audit indexes regularly; drop unused ones |
| Not using CONCURRENTLY | Write blocking during index creation | Always use CREATE INDEX CONCURRENTLY in production |
| Wrong column order in composite index | Index not used by queries | Put most selective or most-queried column first |
| Index on low-cardinality column alone | Index rarely chosen by planner | Use partial index or combine with other columns |
| Forgetting expression index for function queries | Full table scan on every function-based query | Create expression index matching the function |
| Not analyzing after bulk load | Planner statistics stale; bad plans | Run ANALYZE after large data loads |
| Relying on pg_stat_statements without extension | Cannot identify slow queries | Enable pg_stat_statements in shared_preload_libraries |
Performance Optimization
Index Maintenance
-- Check index bloat
SELECT
indexrelname,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
idx_scan AS scans
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC
LIMIT 20;
-- Reindex a bloated index (blocking)
REINDEX INDEX idx_orders_customer_id;
-- Reindex concurrently (PG 12+)
REINDEX INDEX CONCURRENTLY idx_orders_customer_id;
-- Check index usage statistics
SELECT
relname AS table_name,
indexrelname AS index_name,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;Tuning Planner Statistics
-- Increase statistics target for a specific column
ALTER TABLE orders ALTER COLUMN customer_id SET STATISTICS 1000;
-- Analyze to collect new statistics
ANALYZE orders;
-- Check the statistics
SELECT
attname,
n_distinct,
most_common_vals,
most_common_freqs
FROM pg_stats
WHERE tablename = 'orders' AND attname = 'customer_id';Work Memory for Sorting
When PostgreSQL uses an index that does not match the sort order of the query, it must sort the results in memory. The work_mem setting controls how much memory is available for this:
-- Increase work_mem for complex sort operations
SET work_mem = '256MB';
-- Check if sorts are spilling to disk
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE customer_id = 42 ORDER BY total_cents DESC;
-- Look for "Sort Method: external merge" which indicates disk spillComparison with Alternatives
| Feature | B-tree | Hash | GIN | GiST | BRIN | SP-GiST |
|---|---|---|---|---|---|---|
Equality (=) | Excellent | Excellent | Good | Good | Good | Good |
Range (<, >, BETWEEN) | Excellent | No support | No support | Good | Good | No support |
Sort (ORDER BY) | Yes | No | No | No | No | No |
| Full-text search | No | No | Excellent | Fair | No | No |
| Array containment | No | No | Excellent | No | No | No |
| Geometric/spatial | No | No | No | Excellent | No | Fair |
| Size on disk | Medium | Small | Large | Medium | Tiny | Medium |
| Write overhead | Low | Low | High | Medium | Very Low | Low |
Advanced Patterns
Partial Indexes with Expression
Combine partial and expression indexes for maximum precision:
-- Index only recent active users with lowercased email lookup
CREATE INDEX idx_active_recent_users ON users (LOWER(email))
WHERE status = 'active' AND created_at > '2023-01-01';Unique Indexes for Data Integrity
Unique indexes serve double duty: they enforce uniqueness and provide fast lookups:
-- Unique index on lowercased email (case-insensitive uniqueness)
CREATE UNIQUE INDEX idx_users_email_unique ON users (LOWER(email));Index-Only Scans Verification
-- Force an index-only scan to verify the covering index works
SET enable_seqscan = off;
EXPLAIN ANALYZE SELECT customer_id, status FROM orders WHERE customer_id = 42;
SET enable_seqscan = on;
-- Look for "Index Only Scan" in the output
-- Check "Heap Fetches" — if high, run VACUUM to update visibility mapTesting Strategies
Benchmarking Index Impact
-- Disable the index to measure baseline performance
DROP INDEX CONCURRENTLY idx_orders_customer_status;
-- Run benchmark query
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE customer_id = 42 AND status = 'pending';
-- Recreate the index
CREATE INDEX CONCURRENTLY idx_orders_customer_status ON orders (customer_id, status);
-- Run benchmark query again and compare
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE customer_id = 42 AND status = 'pending';Load Testing with pgbench
# Initialize pgbench
pgbench -i -s 50 mydb
# Run read-heavy workload without custom indexes
pgbench -c 10 -T 300 -S mydb
# Create custom indexes, then re-run
pgbench -c 10 -T 300 -S mydbFuture Outlook
PostgreSQL continues to improve index capabilities. Recent improvements include B-tree deduplication (PG 13, reduces index size for duplicate values), improved BRIN parallel build (PG 14), and faster index-only scans through visibility map improvements. Future releases are expected to add better support for multi-dimensional indexes and improved statistics for complex data distributions.
The trend toward storing more diverse data types (JSONB, vectors for AI embeddings, geospatial) in PostgreSQL makes specialized index types increasingly important. The pgvector extension, for example, uses IVFFlat and HNSW indexes for vector similarity search, bringing PostgreSQL into the AI/ML infrastructure stack.
Conclusion
Effective indexing is the foundation of PostgreSQL performance tuning. The right index can improve query performance by orders of magnitude, while the wrong index (or no index) can bring a production system to its knees. The key takeaways are:
- Default to B-tree indexes for most workloads—they handle equality, range, and sorting
- Use composite indexes with the most selective column first
- Use covering indexes (
INCLUDE) for index-only scans on hot queries - Use partial indexes when you frequently query a subset of rows
- Use expression indexes when queries use functions on columns
- Use GIN indexes for full-text search, JSONB containment, and array operations
- Use BRIN indexes for naturally ordered data on very large tables
- Always create indexes concurrently in production
- Monitor and remove unused indexes quarterly
- Use
EXPLAIN (ANALYZE, BUFFERS)to verify index usage and diagnose performance
Start by enabling pg_stat_statements, identifying your slowest queries, and creating targeted indexes. Use EXPLAIN ANALYZE to verify each index is being used. The investment in proper indexing pays dividends across every aspect of your application's performance.