MinhVo

Minh Vo

rss feed

Slaying code & making it lit fr fr 🔥 tagline

Hey there 👋 I'm an AI Engineer with 7 years of experience building scalable web and mobile applications. Currently at Neurond AI (May 2025 — present), architecting an Enterprise AI Assistant Platform with multi-tenant RAG on pgvector, multi-provider LLM orchestration, and Azure-native infrastructure. Previously spent 5+ years at SNAPTEC (Sep 2019 — Apr 2025), leading SaaS themes, admin dashboards, and e-commerce platforms — earned the Hero of the Year award in 2021. I specialize in TypeScript, React, Next.js, and AI-Native engineering with Claude Code and Cursor.bio

Back to blogs

Database Indexing Strategies: B-Tree, Hash, and GIN

Deep dive into database indexes: types, use cases, composite indexes, and EXPLAIN ANALYZE.

DatabaseIndexingPerformancePostgreSQL

By MinhVo

Introduction

The difference between a query that takes 50 milliseconds and one that takes 50 seconds often comes down to a single decision: which index type you chose. Database indexes are not one-size-fits-all. B-tree indexes excel at range queries and sorting, Hash indexes provide the fastest equality lookups, GIN indexes power full-text search and JSONB containment queries, and GiST indexes handle geometric and range type operations. Choosing the wrong index type for your workload means leaving performance on the table—or worse, creating indexes that the query optimizer never uses.

This guide provides a deep technical exploration of the three most impactful index types in PostgreSQL: B-tree, Hash, and GIN. You'll learn how each type works internally, when to use each one, how to design effective composite indexes, and how to use EXPLAIN ANALYZE to validate that your indexes are actually being used. By the end, you'll have a mental framework for choosing the right index type for any query pattern you encounter.

Database index data structures

Understanding Index Types: Core Concepts

B-Tree Indexes: The Workhorse

B-tree (balanced tree) is PostgreSQL's default index type and handles the vast majority of indexing needs. A B-tree index maintains a sorted, balanced tree structure where leaf pages contain index entries sorted by key value, and internal pages contain pivot keys that guide the search traversal.

Internally, each B-tree page (typically 8 KB in PostgreSQL) holds multiple index entries. An internal page might hold hundreds of pivot keys, meaning a three-level B-tree can index tens of millions of rows. The depth grows logarithmically: a four-level B-tree handles billions of entries.

B-tree supports all comparison operators (<, <=, =, >=, >) and can satisfy ORDER BY clauses without an additional sort step. It also supports pattern matching with LIKE 'prefix%' (left-anchored patterns) because the sorted order preserves prefix relationships.

-- B-tree handles range queries efficiently
SELECT * FROM orders WHERE created_at BETWEEN '2024-01-01' AND '2024-06-30';
 
-- B-tree handles ORDER BY without extra sort
SELECT * FROM users ORDER BY created_at DESC LIMIT 10;
 
-- B-tree handles left-anchored LIKE
SELECT * FROM products WHERE name LIKE 'Widget%';

Hash Indexes: Pure Equality Optimization

Hash indexes use a hash table structure optimized exclusively for equality comparisons (=). They compute a hash of the indexed value and store entries in hash buckets. Lookups compute the hash, jump directly to the bucket, and scan the bucket for the exact match.

Hash indexes have two key advantages over B-tree for equality: they're smaller (no ordering overhead), and lookups have O(1) average complexity instead of O(log n). For high-cardinality columns used exclusively in equality predicates—like UUIDs, email addresses, or API keys—hash indexes can be 20–30% faster than B-tree.

However, hash indexes have severe limitations. They don't support range queries, sorting, LIKE patterns, IS NULL, or multi-column indexing. Before PostgreSQL 10, hash indexes weren't WAL-logged, making them unsuitable for production. Since PostgreSQL 10, they're fully crash-safe and replicated, but they still can't be used as covering indexes or with the INCLUDE clause.

-- Hash index: perfect for equality-only lookups
CREATE INDEX idx_users_email_hash ON users USING hash (email);
 
-- This uses the hash index
SELECT * FROM users WHERE email = 'user@example.com';
 
-- This CANNOT use the hash index (range query)
SELECT * FROM users WHERE email >= 'a' AND email <= 'b';

GIN Indexes: The Inverted Index

GIN (Generalized Inverted Index) is designed for composite values where you need to search for elements within the value. It inverts the mapping: instead of "row → values," GIN stores "value → set of rows." This makes it ideal for full-text search, array containment, JSONB key/value queries, and trigram text matching.

GIN indexes have two layers. The first layer maps each unique element to a posting list (set of row IDs containing that element). The second layer uses a B-tree-like structure over the elements for fast lookup. When you query "find rows where the tags array contains 'postgresql'," GIN looks up 'postgresql' in the element tree, retrieves the posting list, and returns the matching rows directly.

The trade-off is write performance. GIN indexes must update posting lists for every insert, update, or delete affecting indexed elements. To mitigate this, GIN supports "fast update" mode (enabled by default), which buffers pending updates in a pending list and merges them into the main structure during subsequent searches or when the pending list reaches a threshold.

-- GIN for JSONB containment queries
CREATE INDEX idx_data_tags ON json_data USING gin (tags);
 
-- Uses the GIN index for "contains" operator
SELECT * FROM json_data WHERE tags @> '{"category": "electronics"}';
 
-- GIN for full-text search
CREATE INDEX idx_articles_fts ON articles USING gin (to_tsvector('english', content));
 
-- Uses the GIN index for text search
SELECT * FROM articles WHERE to_tsvector('english', content) @@ to_tsquery('english', 'postgresql & indexing');

Index performance comparison chart

Architecture and Design Patterns

Composite Index Design

Composite indexes include multiple columns and follow strict rules about which queries can use them. Understanding these rules is critical for designing effective indexes.

The leftmost prefix rule states that a composite index on (a, b, c) can be used by queries that filter on a alone, (a, b), or (a, b, c), but not b alone or (b, c). The index sorts data first by a, then by b within each a value, then by c within each (a, b) pair. Queries must use the leading columns to benefit from the sorted order.

-- Index: (tenant_id, created_at DESC)
CREATE INDEX idx_events_tenant_time ON events (tenant_id, created_at DESC);
 
-- ✅ Uses index: filters on leftmost column
SELECT * FROM events WHERE tenant_id = 42 ORDER BY created_at DESC;
 
-- ❌ Cannot use index: skips leftmost column
SELECT * FROM events WHERE created_at > '2024-01-01';

The ESR Rule for Optimal Column Ordering

The ESR (Equality, Sort, Range) rule provides a systematic approach to column ordering:

Equality columns come first because they allow PostgreSQL to seek directly to matching rows. Sort columns come next to enable index-ordered results without a separate sort step. Range columns come last because they filter within the already-positioned and sorted subset.

-- Query pattern
SELECT * FROM orders
WHERE status = 'completed'          -- Equality
  AND region = 'us-east-1'          -- Equality
  AND created_at > '2024-01-01'     -- Range
ORDER BY created_at DESC;           -- Sort
 
-- Optimal index following ESR
CREATE INDEX idx_orders_optimized
  ON orders (status, region, created_at DESC);
-- Equality ↑    ↑ Sort  ↑ Range

Partial Indexes for Selective Filtering

Partial indexes include a WHERE clause at creation time, indexing only rows that match the predicate. This creates dramatically smaller indexes when your queries consistently filter for a subset of data.

-- Only 2% of orders are "pending" — most queries filter for these
CREATE INDEX idx_orders_pending
  ON orders (customer_id, created_at)
  WHERE status = 'pending';
 
-- This query uses the partial index
SELECT * FROM orders WHERE status = 'pending' AND customer_id = 12345;

The optimizer can only use a partial index when the query's WHERE clause logically implies the index's WHERE clause. If the query filters status = 'pending', the partial index applies. If the query filters status IN ('pending', 'processing'), the optimizer cannot use this partial index.

GIN Operator Classes

GIN indexes support different operator classes that determine which operations are accelerated:

-- JSONB: containment and existence operators
CREATE INDEX idx_jsonb_gin ON table USING gin (jsonb_column);
CREATE INDEX idx_jsonb_path ON table USING gin (jsonb_column jsonb_path_ops);
 
-- Arrays: overlap and containment
CREATE INDEX idx_array_gin ON table USING gin (array_column);
 
-- Trigram: LIKE and ILIKE patterns
CREATE EXTENSION pg_trgm;
CREATE INDEX idx_text_trgm ON table USING gin (text_column gin_trgm_ops);
 
-- Full-text search
CREATE INDEX idx_fts ON table USING gin (to_tsvector('english', content));

The jsonb_path_ops operator class is smaller and faster than the default jsonb_ops but only supports @> (containment) and @?/@@ (path existence/query) operators. If you don't need the broader operator set, jsonb_path_ops is the better choice.

Step-by-Step Implementation

EXPLAIN ANALYZE Fundamentals

EXPLAIN ANALYZE is your primary tool for understanding query execution. It shows the actual execution plan, timing, and row counts:

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, o.total, u.name
FROM orders o
JOIN users u ON u.id = o.user_id
WHERE o.status = 'completed'
  AND o.created_at > '2024-01-01'
ORDER BY o.created_at DESC
LIMIT 20;

Key things to look for in the output:

Limit  (cost=0.71..28.45 rows=20 width=48) (actual time=0.025..0.089 rows=20 loops=1)
  ->  Index Scan Backward using idx_orders_status_created on orders o  (cost=0.71..138.92 rows=100 width=48) (actual time=0.024..0.085 rows=20 loops=1)
        Index Cond: ((status = 'completed') AND (created_at > '2024-01-01'::date))
        Buffers: shared hit=23
Planning Time: 0.125 ms
Execution Time: 0.105 ms

Critical metrics: actual time shows real execution time per row, rows should be close to the estimate, and Buffers: shared hit indicates pages read from cache. High shared read counts indicate cache misses and potential I/O bottlenecks.

Creating Effective B-Tree Indexes

-- Step 1: Identify the query pattern
EXPLAIN ANALYZE
SELECT * FROM events
WHERE tenant_id = 42
  AND event_type = 'click'
  AND created_at > NOW() - INTERVAL '7 days'
ORDER BY created_at DESC
LIMIT 50;
 
-- Step 2: Design the index following ESR
CREATE INDEX CONCURRENTLY idx_events_tenant_type_time
  ON events (tenant_id, event_type, created_at DESC);
 
-- Step 3: Verify the index is used
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM events
WHERE tenant_id = 42
  AND event_type = 'click'
  AND created_at > NOW() - INTERVAL '7 days'
ORDER BY created_at DESC
LIMIT 50;
-- Should show "Index Scan using idx_events_tenant_type_time"
-- with no additional Sort step

Creating and Validating Hash Indexes

-- Create hash index for UUID equality lookups
CREATE INDEX CONCURRENTLY idx_sessions_token_hash
  ON sessions USING hash (session_token);
 
-- Verify it's used
EXPLAIN ANALYZE
SELECT * FROM sessions WHERE session_token = 'abc-123-def';
-- Should show "Index Scan using idx_sessions_token_hash"
 
-- Compare with B-tree for the same query
CREATE INDEX CONCURRENTLY idx_sessions_token_btree
  ON sessions (session_token);
 
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM sessions WHERE session_token = 'abc-123-def';
-- Compare execution time and buffer reads with hash index version

Implementing GIN Indexes for JSONB

-- JSONB containment queries
CREATE INDEX CONCURRENTLY idx_products_attrs_gin
  ON products USING gin (attributes);
 
-- Query: find products with specific attributes
EXPLAIN ANALYZE
SELECT * FROM products
WHERE attributes @> '{"color": "red", "size": "large"}';
-- Should show "Bitmap Index Scan using idx_products_attrs_gin"
 
-- For path-specific queries, use jsonb_path_ops
CREATE INDEX CONCURRENTLY idx_products_attrs_path
  ON products USING gin (attributes jsonb_path_ops);
 
-- Path queries are faster with jsonb_path_ops
EXPLAIN ANALYZE
SELECT * FROM products
WHERE attributes @> '{"color": "red"}';
-- Create full-text search index
CREATE INDEX CONCURRENTLY idx_articles_fts
  ON articles USING gin (to_tsvector('english', title || ' ' || content));
 
-- Search query
EXPLAIN ANALYZE
SELECT id, title, ts_rank(to_tsvector('english', title || ' ' || content), query) AS rank
FROM articles,
     to_tsquery('english', 'postgresql & indexing') query
WHERE to_tsvector('english', title || ' ' || content) @@ query
ORDER BY rank DESC
LIMIT 10;

Real-World Use Cases

Use Case 1: API Key Lookup with Hash Index

An API gateway authenticating 50,000 requests per second needed to look up API keys stored as UUIDs. The B-tree index on the api_key column was 2.3 GB for 50 million keys. Switching to a hash index reduced the index to 1.4 GB and improved average lookup time from 0.15ms to 0.11ms—a 27% improvement at this scale. The savings came from the hash index's simpler structure, which required fewer page reads per lookup.

-- Migration from B-tree to hash
CREATE INDEX CONCURRENTLY idx_api_keys_hash ON api_keys USING hash (key);
-- After verifying performance
DROP INDEX CONCURRENTLY idx_api_keys_btree;

Use Case 2: Product Search with GIN Trigram Index

An e-commerce platform needed fuzzy product name matching. Users searched for "iphone" and expected to find "Apple iPhone 15 Pro Max." Standard B-tree indexes with LIKE '%iphone%' required sequential scans because the pattern isn't left-anchored.

CREATE EXTENSION pg_trgm;
CREATE INDEX idx_products_name_trgm ON products USING gin (name gin_trgm_ops);
 
-- Now supports fuzzy matching efficiently
SELECT name, similarity(name, 'iphone') AS score
FROM products
WHERE name % 'iphone'
ORDER BY score DESC
LIMIT 10;

This GIN trigram index enabled sub-50ms fuzzy searches across 2 million products, replacing a dedicated Elasticsearch cluster that cost $500/month in infrastructure.

Use Case 3: Event Filtering with Composite B-Tree Index

A real-time analytics dashboard queried events filtered by tenant, event type, and time range. The original indexes were individual B-tree indexes on each column. PostgreSQL could only use one index per scan (or bitmap AND/OR multiple indexes), which was slow for this three-column filter pattern.

A single composite index following the ESR rule replaced three individual indexes:

-- Before: three separate indexes (slow bitmap AND)
CREATE INDEX idx_events_tenant ON events (tenant_id);
CREATE INDEX idx_events_type ON events (event_type);
CREATE INDEX idx_events_time ON events (created_at);
 
-- After: single composite index (fast index scan)
CREATE INDEX idx_events_composite
  ON events (tenant_id, event_type, created_at DESC);

The composite index reduced query time from 120ms to 2ms and reduced total index storage by 40%.

Best Practices for Production

  1. Default to B-tree unless you have a specific reason not to: B-tree handles the broadest set of query patterns. Only deviate when you can prove an alternative type is better for your workload.

  2. Use Hash indexes for pure equality lookups on high-cardinality columns: UUIDs, session tokens, and API keys benefit from hash's O(1) average lookup. Only use hash if you're on PostgreSQL 10+ and never need range queries on the column.

  3. Use GIN for JSONB, arrays, and full-text search: GIN's inverted structure is purpose-built for containment and element-existence queries. Choose jsonb_path_ops over the default operator class for JSONB unless you need the full operator set.

  4. Follow the ESR rule for composite indexes: Place equality columns first, sort columns second, range columns last. This maximizes the index's ability to filter, sort, and limit efficiently.

  5. Always use EXPLAIN (ANALYZE, BUFFERS) to validate: Creating an index doesn't guarantee it will be used. Run your actual queries through EXPLAIN ANALYZE to confirm the optimizer selects your index and that performance improves as expected.

  6. Create indexes concurrently: CREATE INDEX CONCURRENTLY allows reads and writes to continue during index creation. This is essential for production tables with continuous traffic.

  7. Consider partial indexes for filtered queries: If your queries consistently filter for a small subset of rows, a partial index with a WHERE clause can be 10–100x smaller and faster than a full index.

  8. Monitor index size growth: Large indexes consume cache space and increase maintenance overhead. Track index sizes with pg_relation_size() and schedule REINDEX CONCURRENTLY for bloated indexes.

Common Pitfalls and Solutions

PitfallImpactSolution
Using hash index for range queriesQuery falls back to sequential scanUse B-tree for any column that needs range comparisons
GIN index write performanceInserts/updates become slow due to posting list updatesUse gin_pending_list_limit to control batch merge frequency
Wrong composite index column orderIndex cannot be used for common query patternsApply ESR rule: Equality, Sort, Range
Not using CONCURRENTLYTable locked during index creation, blocking production trafficAlways use CREATE INDEX CONCURRENTLY on large tables
Over-indexing with GIN on write-heavy tablesSignificant write amplification and storage growthConsider BRIN for naturally ordered data, limit GIN to read-heavy columns
Assuming jsonb_path_ops covers all JSONB operatorsQueries using ?, `?, ?&` operators fail with wrong operator class

Performance Optimization

Index Size vs Performance Trade-off

Smaller indexes fit in cache more readily. Here's a comparison of index sizes for the same 10 million row table:

-- Measure index sizes
SELECT
  indexname,
  pg_size_pretty(pg_relation_size(indexname::regclass)) AS index_size
FROM pg_indexes
WHERE tablename = 'products';

Typical results for a 10M row products table:

  • B-tree on name: ~350 MB
  • Hash on name: ~220 MB
  • GIN on attributes (JSONB): ~800 MB
  • GIN trgm on name: ~1.2 GB

Index Scan vs Bitmap Scan vs Sequential Scan

Understanding when PostgreSQL chooses each scan type helps you design better indexes:

-- Force different scan types for comparison
SET enable_indexscan = off;
SET enable_bitmapscan = off;
EXPLAIN ANALYZE SELECT * FROM orders WHERE status = 'completed';
-- Shows sequential scan
 
SET enable_bitmapscan = on;
EXPLAIN ANALYZE SELECT * FROM orders WHERE status = 'completed';
-- Shows bitmap scan (for wider result sets)
 
SET enable_indexscan = on;
EXPLAIN ANALYZE SELECT * FROM orders WHERE status = 'completed';
-- Shows index scan (for narrow result sets)

GIN Pending List Management

-- Check GIN pending list size
SELECT * FROM gin_metapage_info(get_raw_page('idx_products_attrs', 0));
 
-- Force pending list merge
SET gin_pending_list_limit = 0;
SELECT * FROM products WHERE attributes @> '{"test": true}';  -- Triggers merge
RESET gin_pending_list_limit;
 
-- Disable fast update entirely (better write performance, worse search)
ALTER INDEX idx_products_attrs SET (fastupdate = off);

Comparison with Alternatives

FeatureB-TreeHashGINGiSTBRIN
Equality (=)✅ Fast✅ Fastest
Range (<, >, BETWEEN)
Sorting (ORDER BY)
Pattern matching (LIKE 'prefix%')✅ (trgm)
Full-text search
JSONB containment (@>)
Array operations (&&, @>)
Index sizeMediumSmallLargeMediumTiny
Write overheadLowLowHighMediumVery low

Advanced Patterns

Partial GIN Index

-- GIN index only on active products' attributes
CREATE INDEX idx_active_products_attrs
  ON products USING gin (attributes)
  WHERE status = 'active';

Multi-Column GIN with B-Tree Prefix

-- Combine B-tree prefix with GIN for tenant-isolated JSONB queries
-- This requires a separate approach since GIN doesn't support mixed types
-- Solution: use a composite B-tree index on tenant_id with GIN on attributes
 
-- Query pattern
SELECT * FROM documents
WHERE tenant_id = 42
  AND metadata @> '{"type": "invoice"}';
 
-- Create B-tree on tenant_id, GIN on metadata separately
CREATE INDEX idx_docs_tenant ON documents (tenant_id);
CREATE INDEX idx_docs_metadata ON documents USING gin (metadata);
 
-- PostgreSQL will bitmap AND both indexes

Testing Strategies

-- Benchmark index types against each other
EXPLAIN (ANALYZE, BUFFERS, TIMING, SUMMARY)
SELECT * FROM sessions WHERE session_token = 'test-token';
 
-- Compare B-tree vs Hash performance
CREATE INDEX CONCURRENTLY idx_sessions_token_btree ON sessions (session_token);
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM sessions WHERE session_token = 'test-token';
DROP INDEX CONCURRENTLY idx_sessions_token_btree;
 
CREATE INDEX CONCURRENTLY idx_sessions_token_hash ON sessions USING hash (session_token);
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM sessions WHERE session_token = 'test-token';
 
-- Test index is used by the optimizer
SET enable_seqscan = off;
EXPLAIN SELECT * FROM orders WHERE status = 'completed';
RESET enable_seqscan;

Future Outlook

PostgreSQL's indexing capabilities continue to expand. Recent versions introduced B-tree deduplication (automatically compressing duplicate keys), improved GIN compression, and parallel index builds. Upcoming features include improved BRIN index capabilities, better statistics for JSONB paths, and potential native support for bloom filter indexes for multi-column equality queries.

Cloud providers are adding AI-driven index advisors—Amazon's pg_query_optimizer and Google's AlloyDB AI features analyze query workloads and recommend optimal indexes automatically. These tools lower the barrier to entry but can't replace deep understanding of index internals for edge cases and performance-critical applications.

Conclusion

Choosing the right index type is one of the most impactful database optimization decisions you'll make. B-tree remains the default and correct choice for most workloads, handling range queries, sorting, and equality with excellent all-around performance. Hash indexes provide measurable speedups for pure equality lookups on high-cardinality columns. GIN indexes unlock capabilities that would otherwise require external search infrastructure—full-text search, JSONB queries, and fuzzy text matching.

Key takeaways:

  1. B-tree first: Always start with B-tree. It handles the broadest set of query patterns and is the most battle-tested index type.
  2. Hash for pure equality: When a column is only ever used in = comparisons and you need maximum lookup speed, hash indexes deliver.
  3. GIN for complex data types: JSONB, arrays, and full-text search all benefit from GIN's inverted index structure.
  4. Use EXPLAIN ANALYZE religiously: Never assume an index is being used. Validate with actual execution plans.
  5. Follow ESR for composites: Column ordering in composite indexes determines whether the index can serve your query patterns.

Start by examining your slowest queries with EXPLAIN (ANALYZE, BUFFERS), identify the query patterns that would benefit from alternative index types, and benchmark the improvements with production-sized data. The right index can transform your application's database performance.