Introduction
If you're building AI-powered applications and already use PostgreSQL, pgvector is the most pragmatic way to add vector similarity search to your stack. Rather than introducing a separate vector database like Pinecone or Weaviate, pgvector lets you store and query vector embeddings directly in your existing Postgres database. This means you can combine vector similarity search with your relational data using familiar SQL syntax, ACID transactions, and existing tooling.
pgvector has gained massive adoption since its release, with major cloud providers including AWS RDS, Google Cloud SQL, Azure Database for PostgreSQL, Supabase, and Neon all offering native support. Companies like Notion, HubSpot, and various Y Combinator startups use pgvector in production for semantic search, recommendation systems, and retrieval-augmented generation (RAG) applications.
In this comprehensive guide, we'll cover everything from installation and basic usage to advanced indexing strategies, performance optimization, and production deployment patterns. By the end, you'll be able to implement a production-ready vector search system on top of your existing PostgreSQL database.
Understanding pgvector: Core Concepts
pgvector extends PostgreSQL with a new vector data type and several distance operators that enable efficient similarity search. Understanding how it works under the hood helps you make informed decisions about indexing and query optimization.
The Vector Data Type
pgvector introduces the vector data type that stores arrays of floating-point numbers. Vectors can have up to 2,000 dimensions by default (configurable up to 16,000). Each dimension represents a feature learned by an embedding model—the more dimensions, the more expressive the representation, but the more storage and compute required.
Distance Metrics
pgvector supports three distance metrics:
- L2 (Euclidean) distance (
<->): The straight-line distance between two points. Best for normalized embeddings where magnitude doesn't matter. - Cosine distance (
<=>): Measures the angle between vectors, ignoring magnitude. The most popular choice for text embeddings. - Inner product (
<#>): The negative dot product. Use when you want to maximize similarity and vectors are normalized.
The choice of distance metric depends on your embedding model. Most text embedding models (OpenAI, Cohere, Sentence Transformers) produce normalized vectors, so cosine distance is typically the best choice.
Approximate vs. Exact Search
pgvector supports both exact and approximate nearest neighbor (ANN) search. Exact search scans every row—perfect for small datasets but prohibitively slow for millions of vectors. ANN search uses indexes to dramatically speed up queries at the cost of slightly reduced accuracy (recall). The key insight is that you can tune the tradeoff between speed and accuracy.
Architecture and Design Patterns
Index Types
pgvector supports two index types, each with different tradeoffs:
IVFFlat (Inverted File with Flat quantization):
- Partitions vectors into lists using k-means clustering
- Searches only a subset of lists (specified by
probesparameter) - Best for datasets under 1 million vectors
- Faster to build than HNSW
- Lower recall at the same speed compared to HNSW
HNSW (Hierarchical Navigable Small World):
- Creates a multi-layer graph structure for fast traversal
- Superior recall and query speed compared to IVFFlat
- Slower to build and uses more memory
- Best for datasets over 100K vectors
- The recommended default for most applications
Storage Architecture
Each vector with 1536 dimensions (OpenAI's standard) requires approximately 6KB of storage. For 1 million vectors, that's roughly 6GB just for the vector data, plus index overhead. HNSW indexes add approximately 1.5x the vector storage size, while IVFFlat adds minimal overhead.
Multi-Tenant Patterns
For SaaS applications serving multiple customers, pgvector supports several multi-tenancy patterns:
- Row-level filtering: Add a
tenant_idcolumn and filter queries - Schema-per-tenant: Separate schemas for each tenant
- Table-per-tenant: Separate tables (best performance, most complex)
The row-level approach is simplest and works well for most applications. The key is to include the tenant filter in your HNSW queries for efficient execution.
Step-by-Step Implementation
Installation and Setup
-- Enable the extension (requires superuser or CREATE privilege)
CREATE EXTENSION IF NOT EXISTS vector;
-- Verify installation
SELECT * FROM pg_extension WHERE extname = 'vector';
-- For cloud providers:
-- AWS RDS: Already available, just run CREATE EXTENSION
-- Supabase: Enable in Dashboard > Database > Extensions
-- Neon: CREATE EXTENSION vector;
-- Azure: Available in Flexible ServerCreating Tables with Vectors
-- Documents table with embeddings
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
tenant_id UUID NOT NULL,
title TEXT NOT NULL,
content TEXT NOT NULL,
metadata JSONB DEFAULT '{}',
embedding vector(1536), -- OpenAI text-embedding-3-small dimension
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- Create indexes for common query patterns
CREATE INDEX idx_documents_tenant ON documents(tenant_id);
CREATE INDEX idx_documents_metadata ON documents USING gin(metadata);
-- HNSW index for vector similarity search
-- m: max connections per layer (16-64, higher = better recall, more memory)
-- ef_construction: build-time quality (higher = better recall, slower build)
CREATE INDEX idx_documents_embedding ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- For smaller datasets, IVFFlat is faster to build
-- lists: number of clusters (sqrt of rows is a good starting point)
CREATE INDEX idx_documents_embedding_ivf ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);Generating and Storing Embeddings
// Using OpenAI to generate embeddings
import OpenAI from 'openai';
import pg from 'pg';
const openai = new OpenAI();
const pool = new pg.Pool({ connectionString: process.env.DATABASE_URL });
async function storeDocument(title, content, tenantId, metadata = {}) {
// Generate embedding
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: `${title}\n\n${content}`,
});
const embedding = response.data[0].embedding;
// Store in PostgreSQL
const result = await pool.query(
`INSERT INTO documents (tenant_id, title, content, metadata, embedding)
VALUES ($1, $2, $3, $4, $5::vector)
RETURNING id`,
[tenantId, title, content, JSON.stringify(metadata), `[${embedding.join(',')}]`]
);
return result.rows[0].id;
}
// Batch insert for better performance
async function storeDocuments(documents, tenantId) {
const values = [];
const params = [];
let paramIndex = 1;
for (const doc of documents) {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: `${doc.title}\n\n${doc.content}`,
});
const embedding = response.data[0].embedding;
values.push(
`($${paramIndex}, $${paramIndex + 1}, $${paramIndex + 2}, $${paramIndex + 3}, $${paramIndex + 4}::vector)`
);
params.push(tenantId, doc.title, doc.content, JSON.stringify(doc.metadata || {}));
params.push(`[${embedding.join(',')}]`);
paramIndex += 5;
}
await pool.query(
`INSERT INTO documents (tenant_id, title, content, metadata, embedding)
VALUES ${values.join(', ')}`,
params
);
}Querying with Similarity Search
-- Basic similarity search (cosine distance)
SELECT id, title, content,
1 - (embedding <=> $1::vector) AS similarity
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 10;
-- Multi-tenant similarity search
SELECT id, title, content,
1 - (embedding <=> $1::vector) AS similarity
FROM documents
WHERE tenant_id = $2
ORDER BY embedding <=> $1::vector
LIMIT 10;
-- Similarity search with metadata filter
SELECT id, title, content, metadata,
1 - (embedding <=> $1::vector) AS similarity
FROM documents
WHERE tenant_id = $2
AND metadata->>'category' = $3
ORDER BY embedding <=> $1::vector
LIMIT 10;
-- Similarity threshold (only return results above a minimum score)
SELECT id, title, content,
1 - (embedding <=> $1::vector) AS similarity
FROM documents
WHERE tenant_id = $2
AND 1 - (embedding <=> $1::vector) > 0.8
ORDER BY embedding <=> $1::vector
LIMIT 10;Application-Level Query Service
class VectorSearchService {
constructor(pool, openai) {
this.pool = pool;
this.openai = openai;
}
async search(query, options = {}) {
const {
tenantId,
limit = 10,
threshold = 0.7,
category = null,
} = options;
// Generate query embedding
const response = await this.openai.embeddings.create({
model: 'text-embedding-3-small',
input: query,
});
const queryEmbedding = response.data[0].embedding;
// Build dynamic query
let sql = `
SELECT id, title, content, metadata,
1 - (embedding <=> $1::vector) AS similarity
FROM documents
WHERE tenant_id = $2
AND 1 - (embedding <=> $1::vector) > $3
`;
const params = [`[${queryEmbedding.join(',')}]`, tenantId, threshold];
let paramIndex = 4;
if (category) {
sql += ` AND metadata->>'category' = $${paramIndex}`;
params.push(category);
paramIndex++;
}
sql += ` ORDER BY embedding <=> $1::vector LIMIT $${paramIndex}`;
params.push(limit);
const result = await this.pool.query(sql, params);
return result.rows;
}
async ragQuery(question, tenantId, contextLimit = 5) {
// Retrieve relevant documents
const docs = await this.search(question, {
tenantId,
limit: contextLimit,
threshold: 0.7,
});
// Build context from retrieved documents
const context = docs
.map((doc, i) => `[${i + 1}] ${doc.title}: ${doc.content}`)
.join('\n\n');
// Generate answer with context
const completion = await this.openai.chat.completions.create({
model: 'gpt-4',
messages: [
{
role: 'system',
content: `Answer the question based on the following context:\n\n${context}`,
},
{ role: 'user', content: question },
],
});
return {
answer: completion.choices[0].message.content,
sources: docs.map((d) => ({ id: d.id, title: d.title, similarity: d.similarity })),
};
}
}Real-World Use Cases and Case Studies
Use Case 1: Internal Knowledge Base Search
A 500-person company uses pgvector to power their internal wiki search. Documents from Confluence, Google Docs, and Notion are chunked, embedded, and stored in PostgreSQL alongside their original metadata. Employees search using natural language and get relevant results ranked by semantic similarity rather than keyword matching. pgvector's ability to combine vector search with SQL filters (department, date range, document type) makes it ideal for this use case.
Use Case 2: Customer Support RAG
A SaaS company implements a customer support chatbot using pgvector and GPT-4. Support articles, product documentation, and previous ticket resolutions are embedded and stored. When a customer asks a question, the system retrieves the most relevant documentation chunks and uses them as context for GPT-4 to generate accurate, grounded answers. The response includes source links so customers can verify the information.
Use Case 3: E-Commerce Product Recommendations
An online marketplace stores product embeddings generated from descriptions, reviews, and images. When a user views a product, the system queries pgvector for similar products, filtered by category and price range. The combination of vector similarity and relational filtering produces highly relevant recommendations without a separate recommendation engine.
Use Case 4: Content Deduplication
A news aggregation platform uses pgvector to detect duplicate or near-duplicate articles. New articles are embedded and compared against existing articles using cosine similarity. Articles above a 0.95 similarity threshold are flagged as potential duplicates for editorial review. This approach catches duplicates even when the wording differs significantly.
Best Practices for Production
-
Choose HNSW over IVFFlat for most workloads: HNSW provides better recall and query speed. Use IVFFlat only if you need faster index builds on datasets under 1 million vectors.
-
Tune HNSW parameters for your workload: Set
m(connections per layer) between 16-32 andef_constructionbetween 64-200. Higher values improve recall but increase build time and memory. -
Adjust
ef_searchat query time: IncreaseSET hnsw.ef_search = 100for higher recall during queries. The default of 40 may miss relevant results for large datasets. -
Batch your embedding generation: When inserting multiple documents, batch API calls to your embedding provider. This reduces latency and API costs.
-
Use partial indexes for multi-tenancy: Create indexes on subsets of data to reduce index size and improve query speed.
-
Monitor index size and build time: HNSW indexes can be 1.5x the size of your vector data. Plan storage accordingly.
-
Vacuum and analyze regularly: Like all PostgreSQL tables, vector tables benefit from regular maintenance.
-
Use connection pooling: Vector queries are fast but connection setup is slow. Use PgBouncer or application-level pooling.
Common Pitfalls and Solutions
| Pitfall | Impact | Solution |
|---|---|---|
| Not creating an index | Full table scan on every query, O(n) complexity | Always create HNSW or IVFFlat index for datasets > 10K rows |
| Wrong distance metric | Incorrect similarity rankings | Match metric to embedding model (cosine for text, L2 for images) |
| Storing embeddings inline with large text | Bloated table, slow sequential scans | Consider separate table for embeddings or TOAST compression |
| Not setting ef_search | Low recall with HNSW index | SET hnsw.ef_search = 100 (or higher) before queries |
| Ignoring index build time | HNSW on 1M vectors takes hours | Build indexes during low-traffic periods, use CONCURRENTLY |
| Embedding dimension mismatch | Insert/query errors | Validate vector dimensions match index definition |
Performance Optimization
// Connection pooling configuration
import pg from 'pg';
const pool = new pg.Pool({
connectionString: process.env.DATABASE_URL,
max: 20,
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000,
});
// Optimize PostgreSQL settings for vector workloads
// postgresql.conf:
// shared_buffers = 4GB
// effective_cache_size = 12GB
// maintenance_work_mem = 1GB
// work_mem = 256MB
// Query-time tuning
async function optimizedSearch(queryEmbedding, tenantId) {
// Set higher ef_search for better recall
await pool.query("SET hnsw.ef_search = 100");
const result = await pool.query(
`SELECT id, title, 1 - (embedding <=> $1::vector) AS score
FROM documents
WHERE tenant_id = $2
ORDER BY embedding <=> $1::vector
LIMIT 10`,
[`[${queryEmbedding.join(',')}]`, tenantId]
);
return result.rows;
}
// Parallel queries for large result sets
async function parallelSearch(queryEmbedding, categories) {
const promises = categories.map(category =>
pool.query(
`SELECT id, title, 1 - (embedding <=> $1::vector) AS score
FROM documents
WHERE metadata->>'category' = $2
ORDER BY embedding <=> $1::vector
LIMIT 5`,
[`[${queryEmbedding.join(',')}]`, category]
)
);
const results = await Promise.all(promises);
return results.flatMap(r => r.rows)
.sort((a, b) => b.score - a.score)
.slice(0, 10);
}Comparison with Alternatives
| Feature | pgvector | Pinecone | Weaviate | Qdrant | Milvus |
|---|---|---|---|---|---|
| Deployment | PostgreSQL extension | Managed | Self-hosted/managed | Self-hosted/managed | Self-hosted/managed |
| SQL Support | Full PostgreSQL SQL | API only | GraphQL | REST/gRPC | REST/gRPC |
| ACID Compliance | Yes | No | No | No | No |
| Relational Data | Native joins | Metadata only | Limited | Limited | Limited |
| Max Vectors | Billions (with tuning) | Billions | Billions | Billions | Billions |
| Hybrid Search | SQL + pg_trgm | No | Yes | Yes | Yes |
| Operational Overhead | None (existing PG) | None (managed) | Medium | Medium | High |
| Cost | Existing PG cost | Per-query or pod | Free or cloud | Free or cloud | Free or cloud |
| Best For | PG users, < 10M vectors | Simple managed | AI-native features | Performance-critical | Large-scale ML |
Advanced Patterns
Chunking Strategy for RAG
class DocumentChunker {
constructor(options = {}) {
this.chunkSize = options.chunkSize || 500;
this.chunkOverlap = options.chunkOverlap || 50;
}
chunk(text, metadata = {}) {
const sentences = text.split(/(?<=[.!?])\s+/);
const chunks = [];
let currentChunk = '';
let currentSize = 0;
for (const sentence of sentences) {
if (currentSize + sentence.length > this.chunkSize && currentChunk) {
chunks.push({
content: currentChunk.trim(),
metadata: { ...metadata, chunkIndex: chunks.length },
});
// Keep overlap
const words = currentChunk.split(' ');
const overlapWords = words.slice(-Math.floor(this.chunkOverlap / 5));
currentChunk = overlapWords.join(' ') + ' ' + sentence;
currentSize = currentChunk.length;
} else {
currentChunk += (currentChunk ? ' ' : '') + sentence;
currentSize += sentence.length;
}
}
if (currentChunk.trim()) {
chunks.push({
content: currentChunk.trim(),
metadata: { ...metadata, chunkIndex: chunks.length },
});
}
return chunks;
}
}Incremental Index Updates
-- Efficiently update embeddings for modified documents
WITH updated AS (
SELECT id, $1::vector AS new_embedding
FROM documents
WHERE id = ANY($2)
)
UPDATE documents d
SET embedding = u.new_embedding,
updated_at = NOW()
FROM updated u
WHERE d.id = u.id;Testing Strategies
describe('VectorSearchService', () => {
let mockPool;
let mockOpenAI;
let service;
beforeEach(() => {
mockPool = {
query: jest.fn(),
};
mockOpenAI = {
embeddings: {
create: jest.fn().mockResolvedValue({
data: [{ embedding: new Array(1536).fill(0.1) }],
}),
},
};
service = new VectorSearchService(mockPool, mockOpenAI);
});
test('should search with correct embedding', async () => {
mockPool.query.mockResolvedValue({
rows: [{ id: 1, title: 'Test', similarity: 0.95 }],
});
const results = await service.search('test query', { tenantId: 'tenant-1' });
expect(mockOpenAI.embeddings.create).toHaveBeenCalledWith({
model: 'text-embedding-3-small',
input: 'test query',
});
expect(results).toHaveLength(1);
expect(results[0].similarity).toBe(0.95);
});
test('should apply category filter', async () => {
mockPool.query.mockResolvedValue({ rows: [] });
await service.search('test', { tenantId: 't1', category: 'docs' });
expect(mockPool.query).toHaveBeenCalledWith(
expect.stringContaining("metadata->>'category'"),
expect.arrayContaining(['docs'])
);
});
});Future Outlook
pgvector continues to evolve with the PostgreSQL ecosystem. Upcoming features include:
- Improved HNSW performance: Ongoing optimizations to reduce memory usage and improve build times
- Sparse vector support: For hybrid dense/sparse search combining BM25 and semantic search
- Streaming indexing: Build indexes incrementally as data arrives
- Better parallel query execution: Leverage PostgreSQL's parallel query infrastructure
The trend toward convergence—traditional databases adding vector capabilities while vector databases add relational features—suggests that pgvector's approach of building on PostgreSQL's foundation is well-positioned for the future.
Conclusion
pgvector brings powerful vector similarity search to PostgreSQL without requiring a separate database. Its native SQL integration, ACID compliance, and zero additional infrastructure make it the ideal choice for teams already using PostgreSQL.
Key takeaways:
- Use HNSW indexes for most workloads—superior recall and speed
- Tune
ef_searchat query time to balance recall and performance - Combine vector search with SQL filters for precise, relevant results
- Plan for storage—each 1536-dim vector needs ~6KB plus index overhead
- pgvector shines for RAG applications where you need vector + relational data
For teams that want vector search without operational complexity, pgvector is the clear winner. Start with a simple implementation, measure performance, and optimize as your dataset grows.