MinhVo

Minh Vo

rss feed

Slaying code & making it lit fr fr 🔥 tagline

Hey there 👋 I'm an AI Engineer with 7 years of experience building scalable web and mobile applications. Currently at Neurond AI (May 2025 — present), architecting an Enterprise AI Assistant Platform with multi-tenant RAG on pgvector, multi-provider LLM orchestration, and Azure-native infrastructure. Previously spent 5+ years at SNAPTEC (Sep 2019 — Apr 2025), leading SaaS themes, admin dashboards, and e-commerce platforms — earned the Hero of the Year award in 2021. I specialize in TypeScript, React, Next.js, and AI-Native engineering with Claude Code and Cursor.bio

Back to blogs

Database Normalization: 1NF, 2NF, 3NF, and BCNF

Understand database normalization: reducing redundancy, avoiding anomalies, and practical examples.

DatabaseNormalizationSQLDesign

By MinhVo

Introduction

Database normalization is one of the most fundamental concepts in relational database design, yet it's frequently misunderstood or dismissed as academic theory with little practical value. In reality, normalization is a systematic approach to organizing data that eliminates redundancy, prevents update anomalies, and ensures data integrity. Ignoring normalization principles leads to databases plagued by inconsistent data, wasted storage, and maintenance nightmares.

The normalization process decomposes complex tables into simpler, well-structured relations connected by foreign keys. Each normal form builds on the previous one, progressively eliminating different types of data dependencies and anomalies. Understanding when to normalize and when strategic denormalization makes sense is a hallmark of experienced database design.

This guide walks through the first four normal forms—1NF, 2NF, 3NF, and BCNF—with practical examples, clear explanations of the anomalies each form prevents, and real-world guidance on applying normalization to actual database designs.

Database normalization concepts

Understanding Normalization: The Fundamentals

Why Normalize?

Consider an unnormalized orders table that stores customer information, order details, and product information all in one row. When a customer changes their address, you must update every order they've ever placed. When a product price changes, historical order data becomes inconsistent. When a customer has no orders yet, you can't store their information at all.

These problems—update anomalies, insertion anomalies, and deletion anomalies—are the core issues normalization solves. By decomposing the monolithic table into focused, well-structured tables, each piece of data is stored exactly once, in exactly one place.

Functional Dependencies

The foundation of normalization theory is the concept of functional dependencies. A functional dependency exists when the value of one attribute (or set of attributes) uniquely determines the value of another attribute. For example, in a students table, student_id → student_name means that knowing the student ID uniquely determines the student's name.

Understanding functional dependencies is essential because each normal form is defined in terms of specific types of dependencies. Identifying the functional dependencies in your data is the first step toward proper normalization.

The Normalization Process

Normalization follows a step-by-step process. You start with an unnormalized relation and progressively decompose it into higher normal forms. At each step, you identify violations of the current normal form and split the relation to eliminate them. The process continues until the relation satisfies the desired normal form (typically 3NF or BCNF for most applications).

Normalization process diagram

First Normal Form (1NF)

Definition and Requirements

A relation is in First Normal Form if and only if:

  • All attributes contain atomic (indivisible) values
  • Each row is unique (has a primary key)
  • There are no repeating groups or arrays

1NF eliminates multi-valued attributes and ensures that every cell in the table contains a single value. This is the most basic requirement for a well-structured relational table.

Violations and Fixes

-- VIOLATION: Multi-valued attribute (phones stored as comma-separated list)
CREATE TABLE customers_unnormalized (
    customer_id SERIAL PRIMARY KEY,
    name VARCHAR(255),
    phones VARCHAR(500)  -- "555-0100,555-0101,555-0102"
);
 
-- FIX: Create separate table for phone numbers
CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY,
    name VARCHAR(255)
);
 
CREATE TABLE customer_phones (
    id SERIAL PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(customer_id),
    phone VARCHAR(20) NOT NULL,
    phone_type VARCHAR(20) DEFAULT 'mobile'
);
 
-- VIOLATION: Repeating group (monthly_sales as columns)
CREATE TABLE sales_report (
    report_id SERIAL PRIMARY KEY,
    region VARCHAR(50),
    jan_sales DECIMAL,
    feb_sales DECIMAL,
    mar_sales DECIMAL
    -- ... continues for all months
);
 
-- FIX: One row per month
CREATE TABLE sales_report (
    report_id SERIAL PRIMARY KEY,
    region VARCHAR(50),
    sale_month DATE,
    sales_amount DECIMAL
);

Practical 1NF Example

-- Before 1NF: Orders with multiple products in single row
-- order_id | customer | products
-- 1        | John     | "Laptop, Mouse, Keyboard"
 
-- After 1NF: Separate order items
CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(customer_id),
    order_date DATE NOT NULL
);
 
CREATE TABLE order_items (
    item_id SERIAL PRIMARY KEY,
    order_id INTEGER REFERENCES orders(order_id),
    product_id INTEGER REFERENCES products(product_id),
    quantity INTEGER NOT NULL,
    unit_price DECIMAL(10,2) NOT NULL
);

Second Normal Form (2NF)

Definition and Requirements

A relation is in Second Normal Form if:

  • It is in 1NF
  • Every non-key attribute is fully functionally dependent on the entire primary key

2NF violations occur only in tables with composite primary keys. If a non-key attribute depends on only part of the composite key, it should be moved to a separate table.

Violations and Fixes

-- VIOLATION: Partial dependency in order_items
CREATE TABLE order_items (
    order_id INTEGER,
    product_id INTEGER,
    quantity INTEGER,
    product_name VARCHAR(255),  -- Depends only on product_id, not (order_id, product_id)
    product_category VARCHAR(50), -- Same issue
    unit_price DECIMAL(10,2),
    PRIMARY KEY (order_id, product_id)
);
 
-- FIX: Move product attributes to products table
CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    product_name VARCHAR(255) NOT NULL,
    category VARCHAR(50),
    base_price DECIMAL(10,2)
);
 
CREATE TABLE order_items (
    order_id INTEGER REFERENCES orders(order_id),
    product_id INTEGER REFERENCES products(product_id),
    quantity INTEGER NOT NULL,
    unit_price DECIMAL(10,2) NOT NULL, -- Price at time of order
    PRIMARY KEY (order_id, product_id)
);

Practical 2NF Example

-- Enrollment table with composite key
CREATE TABLE enrollments (
    student_id INTEGER,
    course_id INTEGER,
    enrollment_date DATE,
    student_name VARCHAR(255),  -- Depends only on student_id
    course_name VARCHAR(255),   -- Depends only on course_id
    grade CHAR(1),
    PRIMARY KEY (student_id, course_id)
);
 
-- After 2NF normalization
CREATE TABLE students (
    student_id SERIAL PRIMARY KEY,
    student_name VARCHAR(255) NOT NULL
);
 
CREATE TABLE courses (
    course_id SERIAL PRIMARY KEY,
    course_name VARCHAR(255) NOT NULL
);
 
CREATE TABLE enrollments (
    student_id INTEGER REFERENCES students(student_id),
    course_id INTEGER REFERENCES courses(course_id),
    enrollment_date DATE NOT NULL,
    grade CHAR(1),
    PRIMARY KEY (student_id, course_id)
);

Third Normal Form (3NF)

Definition and Requirements

A relation is in Third Normal Form if:

  • It is in 2NF
  • No non-key attribute is transitively dependent on the primary key

A transitive dependency exists when a non-key attribute depends on another non-key attribute rather than directly on the primary key. 3NF eliminates these indirect dependencies.

Violations and Fixes

-- VIOLATION: Transitive dependency
CREATE TABLE employees (
    employee_id SERIAL PRIMARY KEY,
    name VARCHAR(255),
    department_id INTEGER,
    department_name VARCHAR(100),  -- Depends on department_id, not employee_id
    department_location VARCHAR(100) -- Also depends on department_id
);
 
-- FIX: Extract department information
CREATE TABLE departments (
    department_id SERIAL PRIMARY KEY,
    department_name VARCHAR(100) NOT NULL,
    location VARCHAR(100)
);
 
CREATE TABLE employees (
    employee_id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    department_id INTEGER REFERENCES departments(department_id)
);

Practical 3NF Example

-- Customer orders with transitive dependencies
CREATE TABLE customer_orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INTEGER,
    customer_name VARCHAR(255),    -- Depends on customer_id
    customer_email VARCHAR(255),   -- Depends on customer_id
    customer_city VARCHAR(100),    -- Depends on customer_id
    shipping_city VARCHAR(100),    -- Depends on customer_id (if same as customer city)
    order_total DECIMAL(10,2)
);
 
-- After 3NF normalization
CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    email VARCHAR(255) UNIQUE,
    city VARCHAR(100)
);
 
CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(customer_id),
    shipping_city VARCHAR(100),
    order_total DECIMAL(10,2)
);

3NF normalization example

Boyce-Codd Normal Form (BCNF)

Definition and Requirements

A relation is in Boyce-Codd Normal Form if:

  • It is in 3NF
  • For every functional dependency X → Y, X is a superkey

BCNF is slightly stronger than 3NF. The difference matters when a table has multiple overlapping candidate keys. A relation can be in 3NF but not BCNF when a non-prime attribute determines part of a candidate key.

BCNF Violation Example

-- Consider a course scheduling system
-- Functional dependencies:
-- (student, course) → instructor
-- instructor → course  (each instructor teaches only one course)
 
CREATE TABLE student_courses (
    student_id INTEGER,
    course_id INTEGER,
    instructor_id INTEGER,
    PRIMARY KEY (student_id, course_id)
);
 
-- instructor → course means instructor_id determines course_id
-- But instructor_id is not a superkey (it doesn't determine student_id)
-- This violates BCNF
 
-- FIX: Decompose into BCNF
CREATE TABLE instructor_courses (
    instructor_id INTEGER PRIMARY KEY,
    course_id INTEGER NOT NULL UNIQUE
);
 
CREATE TABLE student_instructors (
    student_id INTEGER,
    instructor_id INTEGER REFERENCES instructor_courses(instructor_id),
    PRIMARY KEY (student_id, instructor_id)
);

Practical BCNF Example

-- Library system with BCNF consideration
CREATE TABLE book_authors (
    book_id INTEGER,
    author_id INTEGER,
    royalty_percentage DECIMAL(5,2),
    PRIMARY KEY (book_id, author_id)
);
 
-- If: author → royalty_percentage (author has fixed royalty rate)
-- This violates BCNF because author is not a superkey
 
-- BCNF normalization
CREATE TABLE authors (
    author_id SERIAL PRIMARY KEY,
    author_name VARCHAR(255),
    royalty_rate DECIMAL(5,2) NOT NULL
);
 
CREATE TABLE book_authors (
    book_id INTEGER REFERENCES books(book_id),
    author_id INTEGER REFERENCES authors(author_id),
    PRIMARY KEY (book_id, author_id)
);

Real-World Use Cases

E-Commerce Product Catalog

An e-commerce platform's product catalog must store products, categories, prices, inventory, and customer reviews. Normalization separates these concerns into distinct tables—products, categories, product_categories (many-to-many), inventory, and reviews—each with clear primary keys and foreign key relationships. This prevents the anomaly of losing all product data when deleting the last review.

Healthcare Patient Records

Patient records contain sensitive information that must be accurately maintained. Normalization ensures that patient demographics are stored once (not duplicated across every visit), medical history is properly linked, and provider information is consistent. A single update to a patient's address doesn't require modifying hundreds of visit records.

Financial Transaction Systems

Financial systems require absolute data integrity. Normalization ensures that account balances are derived from transaction history (not stored redundantly), preventing inconsistencies between stored balances and actual transaction sums. Audit trails are maintained in separate normalized tables.

Best Practices for Production

1. Normalize to 3NF by Default

Most applications should normalize to 3NF as the default. This eliminates the most common anomalies while keeping the schema manageable. Only pursue BCNF when you identify specific violations involving overlapping candidate keys.

2. Denormalize Strategically for Performance

After normalizing, selectively denormalize for read-heavy queries that would otherwise require many joins. Document each denormalization decision and the performance justification.

3. Use Surrogate Keys for Primary Keys

Use auto-incrementing integers or UUIDs as primary keys rather than natural keys. Natural keys can change (email addresses, SSNs), and changing a primary key that's referenced by foreign keys is expensive.

4. Enforce Referential Integrity with Foreign Keys

Always define foreign key constraints between related tables. The database should enforce referential integrity, not application code. Use CASCADE or RESTRICT rules appropriate for your data lifecycle.

5. Index Foreign Key Columns

Foreign key columns should always be indexed. JOIN operations on unindexed foreign keys are a common source of performance problems.

6. Document Normalization Decisions

Record why each table exists and what functional dependencies it represents. This context helps future developers understand the schema and avoid introducing anomalies through improper modifications.

7. Review Schema Changes for Normalization Violations

Before adding columns to existing tables, check whether the new column introduces a functional dependency violation. A column that depends on another non-key column should be in its own table.

8. Use Views for Denormalized Access

When you need denormalized data for queries, create views that join the normalized tables. This provides the convenience of denormalized access without the data integrity risks.

Common Pitfalls and Solutions

PitfallImpactSolution
Storing derived values (age from birthdate)Stale data when birthdate changesCalculate age in queries or views
Duplicating foreign key data in child tablesUpdate anomalies across multiple rowsUse proper foreign key relationships
Multi-valued attributes in single columnCannot query individual values efficientlyCreate separate junction tables
Ignoring partial dependenciesRedundant data in tables with composite keysDecompose to 2NF
Over-normalizing small tablesUnnecessary joins for simple queriesKeep small lookup tables simple
Storing formatted strings (full name)Derived data gets out of syncStore first/last name, format in application

Performance Optimization

Balancing Normalization and Performance

-- Normalized: 3 tables, 2 joins
SELECT o.order_date, p.product_name, oi.quantity, oi.unit_price
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
WHERE o.customer_id = 42;
 
-- Denormalized view for frequent queries
CREATE VIEW order_details AS
SELECT 
    o.order_id, o.order_date, o.customer_id,
    p.product_name, p.category,
    oi.quantity, oi.unit_price,
    (oi.quantity * oi.unit_price) AS line_total
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id;

Materialized Views for Complex Normalized Queries

-- Materialized view for reporting on normalized schema
CREATE MATERIALIZED VIEW monthly_sales_summary AS
SELECT 
    DATE_TRUNC('month', o.order_date) AS month,
    p.category,
    COUNT(DISTINCT o.order_id) AS order_count,
    SUM(oi.quantity * oi.unit_price) AS total_sales
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
GROUP BY DATE_TRUNC('month', o.order_date), p.category;
 
-- Refresh periodically
REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_sales_summary;

Comparison with Alternatives

AspectFully Normalized (3NF)DenormalizedHybrid Approach
Data RedundancyNoneHighControlled
Write PerformanceExcellentSlower (multiple updates)Good
Read PerformanceRequires joinsFast (no joins)Good
Data IntegrityGuaranteed by DBApplication-dependentGood
Storage EfficiencyOptimalHigherModerate
Schema FlexibilityHighLowModerate
Best ForOLTP systemsOLAP/ReportingMost applications

Advanced Patterns

Temporal Normalization

-- Track historical changes to normalized data
CREATE TABLE employees (
    employee_id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL
);
 
CREATE TABLE employee_departments (
    id SERIAL PRIMARY KEY,
    employee_id INTEGER REFERENCES employees(employee_id),
    department_id INTEGER REFERENCES departments(department_id),
    valid_from DATE NOT NULL,
    valid_to DATE,
    is_current BOOLEAN DEFAULT TRUE
);
 
-- Query current department
SELECT e.name, d.department_name
FROM employees e
JOIN employee_departments ed ON e.employee_id = ed.employee_id AND ed.is_current = TRUE
JOIN departments d ON ed.department_id = d.department_id;

Sixth Normal Form (6NF) for Temporal Data

-- 6NF: Every non-key attribute has its own table
-- Useful for temporal/scientific data with many optional attributes
CREATE TABLE sensor_readings (
    reading_id SERIAL PRIMARY KEY,
    sensor_id INTEGER NOT NULL,
    timestamp TIMESTAMPTZ NOT NULL
);
 
CREATE TABLE reading_temperature (
    reading_id INTEGER PRIMARY KEY REFERENCES sensor_readings(reading_id),
    value DECIMAL NOT NULL
);
 
CREATE TABLE reading_humidity (
    reading_id INTEGER PRIMARY KEY REFERENCES sensor_readings(reading_id),
    value DECIMAL NOT NULL
);

Testing Strategies

-- Test for 1NF violations (multi-valued attributes)
SELECT customer_id, phones 
FROM customers 
WHERE phones LIKE '%,%';  -- Comma in value suggests multi-valued
 
-- Test for 2NF violations (partial dependencies)
-- Look for non-key columns that depend on part of composite key
SELECT order_id, product_id, product_name  -- product_name depends only on product_id
FROM order_items
WHERE product_name IS NOT NULL;
 
-- Test for 3NF violations (transitive dependencies)
-- Look for non-key columns that depend on other non-key columns
SELECT employee_id, department_name  -- department_name depends on department_id, not employee_id
FROM employees
WHERE department_name IS NOT NULL;
 
-- Verify referential integrity
SELECT o.order_id 
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.customer_id
WHERE c.customer_id IS NULL;  -- Orphaned foreign keys

Future Outlook

While the core principles of normalization remain unchanged, modern database systems are evolving how they handle normalized data. PostgreSQL's JSONB support enables storing semi-structured data within a normalized framework. Graph databases offer alternative approaches to modeling complex relationships. NewSQL databases provide the consistency guarantees of normalization with the performance characteristics of denormalized systems.

The trend toward event sourcing and CQRS architectures separates read and write models, allowing the write side to maintain strict normalization while the read side uses purpose-built denormalized views. This architectural pattern aligns well with normalization principles while addressing performance requirements.

Conclusion

Database normalization is a practical discipline that directly impacts data integrity, storage efficiency, and maintenance costs. 1NF ensures atomic values and eliminates repeating groups. 2NF eliminates partial dependencies in tables with composite keys. 3NF eliminates transitive dependencies. BCNF handles the remaining edge cases with overlapping candidate keys.

The key insight is that normalization is a tool, not a goal. Normalize to 3NF as your default, then selectively denormalize for specific performance needs. Always document your denormalization decisions and maintain the normalized schema as the source of truth. With this approach, you'll build databases that are both correct and performant.