Vector Search in PostgreSQL with pgvector

Introduction

If you're building AI-powered applications and already use PostgreSQL, pgvector is the most pragmatic way to add vector similarity search to your stack. Rather than introducing a separate vector database like Pinecone or Weaviate, pgvector lets you store and query vector embeddings directly in your existing Postgres database. This means you can combine vector similarity search with your relational data using familiar SQL syntax, ACID transactions, and existing tooling.

pgvector has gained massive adoption since its release, with major cloud providers including AWS RDS, Google Cloud SQL, Azure Database for PostgreSQL, Supabase, and Neon all offering native support. Companies like Notion, HubSpot, and various Y Combinator startups use pgvector in production for semantic search, recommendation systems, and retrieval-augmented generation (RAG) applications.

In this comprehensive guide, we'll cover everything from installation and basic usage to advanced indexing strategies, performance optimization, and production deployment patterns. By the end, you'll be able to implement a production-ready vector search system on top of your existing PostgreSQL database.

Understanding pgvector: Core Concepts

pgvector extends PostgreSQL with a new vector data type and several distance operators that enable efficient similarity search. Understanding how it works under the hood helps you make informed decisions about indexing and query optimization.

The Vector Data Type

pgvector introduces the vector data type that stores arrays of floating-point numbers. Vectors can have up to 2,000 dimensions by default (configurable up to 16,000). Each dimension represents a feature learned by an embedding model—the more dimensions, the more expressive the representation, but the more storage and compute required.

Distance Metrics

pgvector supports three distance metrics:

L2 (Euclidean) distance (<->): The straight-line distance between two points. Best for normalized embeddings where magnitude doesn't matter.
Cosine distance (<=>): Measures the angle between vectors, ignoring magnitude. The most popular choice for text embeddings.
Inner product (<#>): The negative dot product. Use when you want to maximize similarity and vectors are normalized.

The choice of distance metric depends on your embedding model. Most text embedding models (OpenAI, Cohere, Sentence Transformers) produce normalized vectors, so cosine distance is typically the best choice.

Approximate vs. Exact Search

pgvector supports both exact and approximate nearest neighbor (ANN) search. Exact search scans every row—perfect for small datasets but prohibitively slow for millions of vectors. ANN search uses indexes to dramatically speed up queries at the cost of slightly reduced accuracy (recall). The key insight is that you can tune the tradeoff between speed and accuracy.

Architecture and Design Patterns

Index Types

pgvector supports two index types, each with different tradeoffs:

IVFFlat (Inverted File with Flat quantization):

Partitions vectors into lists using k-means clustering
Searches only a subset of lists (specified by probes parameter)
Best for datasets under 1 million vectors
Faster to build than HNSW
Lower recall at the same speed compared to HNSW

HNSW (Hierarchical Navigable Small World):

Creates a multi-layer graph structure for fast traversal
Superior recall and query speed compared to IVFFlat
Slower to build and uses more memory
Best for datasets over 100K vectors
The recommended default for most applications

Storage Architecture

Each vector with 1536 dimensions (OpenAI's standard) requires approximately 6KB of storage. For 1 million vectors, that's roughly 6GB just for the vector data, plus index overhead. HNSW indexes add approximately 1.5x the vector storage size, while IVFFlat adds minimal overhead.

Multi-Tenant Patterns

For SaaS applications serving multiple customers, pgvector supports several multi-tenancy patterns:

Row-level filtering: Add a tenant_id column and filter queries
Schema-per-tenant: Separate schemas for each tenant
Table-per-tenant: Separate tables (best performance, most complex)

The row-level approach is simplest and works well for most applications. The key is to include the tenant filter in your HNSW queries for efficient execution.

Step-by-Step Implementation

Installation and Setup

-- Enable the extension (requires superuser or CREATE privilege)
CREATE EXTENSION IF NOT EXISTS vector;
 
-- Verify installation
SELECT * FROM pg_extension WHERE extname = 'vector';
 
-- For cloud providers:
-- AWS RDS: Already available, just run CREATE EXTENSION
-- Supabase: Enable in Dashboard > Database > Extensions
-- Neon: CREATE EXTENSION vector;
-- Azure: Available in Flexible Server

Creating Tables with Vectors

-- Documents table with embeddings
CREATE TABLE documents (
  id BIGSERIAL PRIMARY KEY,
  tenant_id UUID NOT NULL,
  title TEXT NOT NULL,
  content TEXT NOT NULL,
  metadata JSONB DEFAULT '{}',
  embedding vector(1536),  -- OpenAI text-embedding-3-small dimension
  created_at TIMESTAMPTZ DEFAULT NOW(),
  updated_at TIMESTAMPTZ DEFAULT NOW()
);
 
-- Create indexes for common query patterns
CREATE INDEX idx_documents_tenant ON documents(tenant_id);
CREATE INDEX idx_documents_metadata ON documents USING gin(metadata);
 
-- HNSW index for vector similarity search
-- m: max connections per layer (16-64, higher = better recall, more memory)
-- ef_construction: build-time quality (higher = better recall, slower build)
CREATE INDEX idx_documents_embedding ON documents 
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);
 
-- For smaller datasets, IVFFlat is faster to build
-- lists: number of clusters (sqrt of rows is a good starting point)
CREATE INDEX idx_documents_embedding_ivf ON documents
  USING ivfflat (embedding vector_cosine_ops)
  WITH (lists = 100);

Generating and Storing Embeddings

// Using OpenAI to generate embeddings
import OpenAI from 'openai';
import pg from 'pg';
 
const openai = new OpenAI();
const pool = new pg.Pool({ connectionString: process.env.DATABASE_URL });
 
async function storeDocument(title, content, tenantId, metadata = {}) {
  // Generate embedding
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: `${title}\n\n${content}`,
  });
  const embedding = response.data[0].embedding;
 
  // Store in PostgreSQL
  const result = await pool.query(
    `INSERT INTO documents (tenant_id, title, content, metadata, embedding)
     VALUES ($1, $2, $3, $4, $5::vector)
     RETURNING id`,
    [tenantId, title, content, JSON.stringify(metadata), `[${embedding.join(',')}]`]
  );
 
  return result.rows[0].id;
}
 
// Batch insert for better performance
async function storeDocuments(documents, tenantId) {
  const values = [];
  const params = [];
  let paramIndex = 1;
 
  for (const doc of documents) {
    const response = await openai.embeddings.create({
      model: 'text-embedding-3-small',
      input: `${doc.title}\n\n${doc.content}`,
    });
    const embedding = response.data[0].embedding;
 
    values.push(
      `($${paramIndex}, $${paramIndex + 1}, $${paramIndex + 2}, $${paramIndex + 3}, $${paramIndex + 4}::vector)`
    );
    params.push(tenantId, doc.title, doc.content, JSON.stringify(doc.metadata || {}));
    params.push(`[${embedding.join(',')}]`);
    paramIndex += 5;
  }
 
  await pool.query(
    `INSERT INTO documents (tenant_id, title, content, metadata, embedding)
     VALUES ${values.join(', ')}`,
    params
  );
}

Querying with Similarity Search

-- Basic similarity search (cosine distance)
SELECT id, title, content,
       1 - (embedding <=> $1::vector) AS similarity
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 10;
 
-- Multi-tenant similarity search
SELECT id, title, content,
       1 - (embedding <=> $1::vector) AS similarity
FROM documents
WHERE tenant_id = $2
ORDER BY embedding <=> $1::vector
LIMIT 10;
 
-- Similarity search with metadata filter
SELECT id, title, content, metadata,
       1 - (embedding <=> $1::vector) AS similarity
FROM documents
WHERE tenant_id = $2
  AND metadata->>'category' = $3
ORDER BY embedding <=> $1::vector
LIMIT 10;
 
-- Similarity threshold (only return results above a minimum score)
SELECT id, title, content,
       1 - (embedding <=> $1::vector) AS similarity
FROM documents
WHERE tenant_id = $2
  AND 1 - (embedding <=> $1::vector) > 0.8
ORDER BY embedding <=> $1::vector
LIMIT 10;

Application-Level Query Service

class VectorSearchService {
  constructor(pool, openai) {
    this.pool = pool;
    this.openai = openai;
  }
 
  async search(query, options = {}) {
    const {
      tenantId,
      limit = 10,
      threshold = 0.7,
      category = null,
    } = options;
 
    // Generate query embedding
    const response = await this.openai.embeddings.create({
      model: 'text-embedding-3-small',
      input: query,
    });
    const queryEmbedding = response.data[0].embedding;
 
    // Build dynamic query
    let sql = `
      SELECT id, title, content, metadata,
             1 - (embedding <=> $1::vector) AS similarity
      FROM documents
      WHERE tenant_id = $2
        AND 1 - (embedding <=> $1::vector) > $3
    `;
    const params = [`[${queryEmbedding.join(',')}]`, tenantId, threshold];
    let paramIndex = 4;
 
    if (category) {
      sql += ` AND metadata->>'category' = $${paramIndex}`;
      params.push(category);
      paramIndex++;
    }
 
    sql += ` ORDER BY embedding <=> $1::vector LIMIT $${paramIndex}`;
    params.push(limit);
 
    const result = await this.pool.query(sql, params);
    return result.rows;
  }
 
  async ragQuery(question, tenantId, contextLimit = 5) {
    // Retrieve relevant documents
    const docs = await this.search(question, {
      tenantId,
      limit: contextLimit,
      threshold: 0.7,
    });
 
    // Build context from retrieved documents
    const context = docs
      .map((doc, i) => `[${i + 1}] ${doc.title}: ${doc.content}`)
      .join('\n\n');
 
    // Generate answer with context
    const completion = await this.openai.chat.completions.create({
      model: 'gpt-4',
      messages: [
        {
          role: 'system',
          content: `Answer the question based on the following context:\n\n${context}`,
        },
        { role: 'user', content: question },
      ],
    });
 
    return {
      answer: completion.choices[0].message.content,
      sources: docs.map((d) => ({ id: d.id, title: d.title, similarity: d.similarity })),
    };
  }
}

Real-World Use Cases and Case Studies

Use Case 1: Internal Knowledge Base Search

A 500-person company uses pgvector to power their internal wiki search. Documents from Confluence, Google Docs, and Notion are chunked, embedded, and stored in PostgreSQL alongside their original metadata. Employees search using natural language and get relevant results ranked by semantic similarity rather than keyword matching. pgvector's ability to combine vector search with SQL filters (department, date range, document type) makes it ideal for this use case.

Use Case 2: Customer Support RAG

A SaaS company implements a customer support chatbot using pgvector and GPT-4. Support articles, product documentation, and previous ticket resolutions are embedded and stored. When a customer asks a question, the system retrieves the most relevant documentation chunks and uses them as context for GPT-4 to generate accurate, grounded answers. The response includes source links so customers can verify the information.

Use Case 3: E-Commerce Product Recommendations

An online marketplace stores product embeddings generated from descriptions, reviews, and images. When a user views a product, the system queries pgvector for similar products, filtered by category and price range. The combination of vector similarity and relational filtering produces highly relevant recommendations without a separate recommendation engine.

Use Case 4: Content Deduplication

A news aggregation platform uses pgvector to detect duplicate or near-duplicate articles. New articles are embedded and compared against existing articles using cosine similarity. Articles above a 0.95 similarity threshold are flagged as potential duplicates for editorial review. This approach catches duplicates even when the wording differs significantly.

Best Practices for Production

Choose HNSW over IVFFlat for most workloads: HNSW provides better recall and query speed. Use IVFFlat only if you need faster index builds on datasets under 1 million vectors.
Tune HNSW parameters for your workload: Set m (connections per layer) between 16-32 and ef_construction between 64-200. Higher values improve recall but increase build time and memory.
Adjust ef_search at query time: Increase SET hnsw.ef_search = 100 for higher recall during queries. The default of 40 may miss relevant results for large datasets.
Batch your embedding generation: When inserting multiple documents, batch API calls to your embedding provider. This reduces latency and API costs.
Use partial indexes for multi-tenancy: Create indexes on subsets of data to reduce index size and improve query speed.
Monitor index size and build time: HNSW indexes can be 1.5x the size of your vector data. Plan storage accordingly.
Vacuum and analyze regularly: Like all PostgreSQL tables, vector tables benefit from regular maintenance.
Use connection pooling: Vector queries are fast but connection setup is slow. Use PgBouncer or application-level pooling.

Common Pitfalls and Solutions

Pitfall	Impact	Solution
Not creating an index	Full table scan on every query, O(n) complexity	Always create HNSW or IVFFlat index for datasets > 10K rows
Wrong distance metric	Incorrect similarity rankings	Match metric to embedding model (cosine for text, L2 for images)
Storing embeddings inline with large text	Bloated table, slow sequential scans	Consider separate table for embeddings or TOAST compression
Not setting ef_search	Low recall with HNSW index	SET hnsw.ef_search = 100 (or higher) before queries
Ignoring index build time	HNSW on 1M vectors takes hours	Build indexes during low-traffic periods, use CONCURRENTLY
Embedding dimension mismatch	Insert/query errors	Validate vector dimensions match index definition

Performance Optimization

// Connection pooling configuration
import pg from 'pg';
 
const pool = new pg.Pool({
  connectionString: process.env.DATABASE_URL,
  max: 20,
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 2000,
});
 
// Optimize PostgreSQL settings for vector workloads
// postgresql.conf:
// shared_buffers = 4GB
// effective_cache_size = 12GB
// maintenance_work_mem = 1GB
// work_mem = 256MB
 
// Query-time tuning
async function optimizedSearch(queryEmbedding, tenantId) {
  // Set higher ef_search for better recall
  await pool.query("SET hnsw.ef_search = 100");
  
  const result = await pool.query(
    `SELECT id, title, 1 - (embedding <=> $1::vector) AS score
     FROM documents
     WHERE tenant_id = $2
     ORDER BY embedding <=> $1::vector
     LIMIT 10`,
    [`[${queryEmbedding.join(',')}]`, tenantId]
  );
  
  return result.rows;
}
 
// Parallel queries for large result sets
async function parallelSearch(queryEmbedding, categories) {
  const promises = categories.map(category =>
    pool.query(
      `SELECT id, title, 1 - (embedding <=> $1::vector) AS score
       FROM documents
       WHERE metadata->>'category' = $2
       ORDER BY embedding <=> $1::vector
       LIMIT 5`,
      [`[${queryEmbedding.join(',')}]`, category]
    )
  );
  
  const results = await Promise.all(promises);
  return results.flatMap(r => r.rows)
    .sort((a, b) => b.score - a.score)
    .slice(0, 10);
}

Comparison with Alternatives

Feature	pgvector	Pinecone	Weaviate	Qdrant	Milvus
Deployment	PostgreSQL extension	Managed	Self-hosted/managed	Self-hosted/managed	Self-hosted/managed
SQL Support	Full PostgreSQL SQL	API only	GraphQL	REST/gRPC	REST/gRPC
ACID Compliance	Yes	No	No	No	No
Relational Data	Native joins	Metadata only	Limited	Limited	Limited
Max Vectors	Billions (with tuning)	Billions	Billions	Billions	Billions
Hybrid Search	SQL + pg_trgm	No	Yes	Yes	Yes
Operational Overhead	None (existing PG)	None (managed)	Medium	Medium	High
Cost	Existing PG cost	Per-query or pod	Free or cloud	Free or cloud	Free or cloud
Best For	PG users, < 10M vectors	Simple managed	AI-native features	Performance-critical	Large-scale ML

Advanced Patterns

Chunking Strategy for RAG

class DocumentChunker {
  constructor(options = {}) {
    this.chunkSize = options.chunkSize || 500;
    this.chunkOverlap = options.chunkOverlap || 50;
  }
 
  chunk(text, metadata = {}) {
    const sentences = text.split(/(?<=[.!?])\s+/);
    const chunks = [];
    let currentChunk = '';
    let currentSize = 0;
 
    for (const sentence of sentences) {
      if (currentSize + sentence.length > this.chunkSize && currentChunk) {
        chunks.push({
          content: currentChunk.trim(),
          metadata: { ...metadata, chunkIndex: chunks.length },
        });
        // Keep overlap
        const words = currentChunk.split(' ');
        const overlapWords = words.slice(-Math.floor(this.chunkOverlap / 5));
        currentChunk = overlapWords.join(' ') + ' ' + sentence;
        currentSize = currentChunk.length;
      } else {
        currentChunk += (currentChunk ? ' ' : '') + sentence;
        currentSize += sentence.length;
      }
    }
 
    if (currentChunk.trim()) {
      chunks.push({
        content: currentChunk.trim(),
        metadata: { ...metadata, chunkIndex: chunks.length },
      });
    }
 
    return chunks;
  }
}

Incremental Index Updates

-- Efficiently update embeddings for modified documents
WITH updated AS (
  SELECT id, $1::vector AS new_embedding
  FROM documents
  WHERE id = ANY($2)
)
UPDATE documents d
SET embedding = u.new_embedding,
    updated_at = NOW()
FROM updated u
WHERE d.id = u.id;

Testing Strategies

describe('VectorSearchService', () => {
  let mockPool;
  let mockOpenAI;
  let service;
 
  beforeEach(() => {
    mockPool = {
      query: jest.fn(),
    };
    mockOpenAI = {
      embeddings: {
        create: jest.fn().mockResolvedValue({
          data: [{ embedding: new Array(1536).fill(0.1) }],
        }),
      },
    };
    service = new VectorSearchService(mockPool, mockOpenAI);
  });
 
  test('should search with correct embedding', async () => {
    mockPool.query.mockResolvedValue({
      rows: [{ id: 1, title: 'Test', similarity: 0.95 }],
    });
 
    const results = await service.search('test query', { tenantId: 'tenant-1' });
 
    expect(mockOpenAI.embeddings.create).toHaveBeenCalledWith({
      model: 'text-embedding-3-small',
      input: 'test query',
    });
    expect(results).toHaveLength(1);
    expect(results[0].similarity).toBe(0.95);
  });
 
  test('should apply category filter', async () => {
    mockPool.query.mockResolvedValue({ rows: [] });
 
    await service.search('test', { tenantId: 't1', category: 'docs' });
 
    expect(mockPool.query).toHaveBeenCalledWith(
      expect.stringContaining("metadata->>'category'"),
      expect.arrayContaining(['docs'])
    );
  });
});

Future Outlook

pgvector continues to evolve with the PostgreSQL ecosystem. Upcoming features include:

Improved HNSW performance: Ongoing optimizations to reduce memory usage and improve build times
Sparse vector support: For hybrid dense/sparse search combining BM25 and semantic search
Streaming indexing: Build indexes incrementally as data arrives
Better parallel query execution: Leverage PostgreSQL's parallel query infrastructure

The trend toward convergence—traditional databases adding vector capabilities while vector databases add relational features—suggests that pgvector's approach of building on PostgreSQL's foundation is well-positioned for the future.

Conclusion

pgvector brings powerful vector similarity search to PostgreSQL without requiring a separate database. Its native SQL integration, ACID compliance, and zero additional infrastructure make it the ideal choice for teams already using PostgreSQL.

Key takeaways:

Use HNSW indexes for most workloads—superior recall and speed
Tune ef_search at query time to balance recall and performance
Combine vector search with SQL filters for precise, relevant results
Plan for storage—each 1536-dim vector needs ~6KB plus index overhead
pgvector shines for RAG applications where you need vector + relational data

For teams that want vector search without operational complexity, pgvector is the clear winner. Start with a simple implementation, measure performance, and optimize as your dataset grows.

Minh Vo

Slaying code & making it lit fr fr 🔥 tagline