MinhVo

Minh Vo

rss feed

Slaying code & making it lit fr fr 🔥 tagline

Hey there 👋 I'm an AI Engineer with 7 years of experience building scalable web and mobile applications. Currently at Neurond AI (May 2025 — present), architecting an Enterprise AI Assistant Platform with multi-tenant RAG on pgvector, multi-provider LLM orchestration, and Azure-native infrastructure. Previously spent 5+ years at SNAPTEC (Sep 2019 — Apr 2025), leading SaaS themes, admin dashboards, and e-commerce platforms — earned the Hero of the Year award in 2021. I specialize in TypeScript, React, Next.js, and AI-Native engineering with Claude Code and Cursor.bio

Back to blogs

Docker for Beginners: Containerize Your First Application

Step-by-step guide to Docker: images, containers, Dockerfiles, and docker-compose.

DockerContainersDevOps

By MinhVo

Introduction

Docker has fundamentally changed how developers build, ship, and run applications. Before containers, deploying software meant wrestling with environment differences between development machines, staging servers, and production infrastructure. The classic "it works on my machine" problem consumed countless engineering hours and caused production incidents that could have been avoided.

Containers solve this by packaging your application code together with its runtime, system tools, libraries, and configuration into a single, portable unit. This unit runs identically on any system with a container runtime installed, whether that is your laptop, a colleague's workstation, a CI server, or a production cluster. The isolation guarantees that your application sees the same filesystem, network configuration, and dependencies regardless of where it runs.

This guide takes you from zero Docker knowledge to running a multi-container application with persistent storage, networking, and development-friendly workflows. Every concept is explained with practical examples that you can follow along with on your own machine.

Container shipping analogy

Understanding Docker: Core Concepts

Docker operates on three fundamental abstractions: images, containers, and registries. Understanding how these pieces fit together is essential before writing any Docker commands.

A Docker image is a read-only template that contains everything needed to run an application. Images are built in layers, where each layer represents a set of filesystem changes. This layering system enables efficient storage and transfer because layers shared between images are stored only once. For example, if ten different Node.js applications all use node:20-alpine as their base image, that base layer is downloaded and stored once.

A container is a running instance of an image. You can create multiple containers from the same image, each running in its own isolated environment. Containers share the host operating system's kernel but have their own filesystem, process space, and network interfaces. This makes them far more lightweight than virtual machines while still providing strong isolation.

A registry stores and distributes images. Docker Hub is the default public registry, hosting millions of images from official publishers and the community. Organizations typically run private registries to store proprietary images. The docker pull command downloads images from registries, while docker push uploads them.

Images vs Containers

The relationship between images and containers mirrors the relationship between classes and objects in object-oriented programming. An image is the blueprint; a container is the running instance. You can create many containers from one image, each with its own state (filesystem changes, environment variables, running processes), but they all start from the same base.

# List locally available images
docker images
 
# List running containers
docker ps
 
# List all containers (including stopped)
docker ps -a
 
# Pull an image from Docker Hub
docker pull nginx:1.25-alpine
 
# Create and start a container
docker run -d --name my-nginx -p 8080:80 nginx:1.25-alpine
 
# Stop a container
docker stop my-nginx
 
# Remove a container
docker rm my-nginx
 
# Remove an image
docker rmi nginx:1.25-alpine

The Container Lifecycle

Containers go through several states during their lifetime: created, running, paused, stopped, and removed. Understanding these states helps you manage containers effectively and debug issues when they arise.

# Container lifecycle commands
docker create --name my-app node:20-alpine    # Created (not running)
docker start my-app                             # Running
docker pause my-app                             # Paused
docker unpause my-app                           # Running again
docker stop my-app                              # Stopped (exit code preserved)
docker start my-app                             # Running again
docker rm my-app                                # Removed
 
# Interactive mode (useful for debugging)
docker run -it --name debug-container node:20-alpine /bin/sh

Architecture and Design Patterns

Image Layer Architecture

Docker images are composed of multiple read-only layers stacked on top of each other. When you create a container, Docker adds a thin writable layer on top of the image layers. This writable layer is where all filesystem changes made by the running application are stored.

Understanding layers is crucial for writing efficient Dockerfiles. Each instruction in a Dockerfile creates a new layer. Layers are cached, so if nothing changes in a layer or its dependencies, Docker reuses the cached version during builds. This dramatically speeds up rebuilds but requires thoughtful ordering of Dockerfile instructions.

# Bad: Changing source code invalidates the npm install cache layer
FROM node:20-alpine
WORKDIR /app
COPY . .
RUN npm install
CMD ["node", "index.js"]
 
# Good: Dependencies are cached separately from source code
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
CMD ["node", "index.js"]

Networking Fundamentals

Docker creates a virtual network bridge by default, allowing containers to communicate with each other and the outside world. Each container gets its own IP address on this bridge network. Port mapping (-p flag) forwards traffic from a host port to a container port, making the container accessible from outside the Docker host.

For multi-container applications, user-defined bridge networks provide automatic DNS resolution between containers by service name. This eliminates the need to hardcode IP addresses and makes your application configuration portable.

# Default bridge networking
docker run -d --name web -p 8080:80 nginx:alpine
 
# User-defined network for multi-container communication
docker network create my-app-network
 
docker run -d --name api --network my-app-network node:20-alpine
docker run -d --name db --network my-app-network postgres:16-alpine
 
# The 'api' container can reach 'db' by hostname
# curl http://db:5432 works from inside the api container

Volume Management

Containers are ephemeral by design: when a container is removed, all filesystem changes are lost. Volumes provide persistent storage that survives container lifecycle events. Docker manages named volumes, which are the preferred mechanism for persisting data like database files, uploaded content, and application state.

Bind mounts map a host directory into a container, which is useful for development workflows where you want code changes to be reflected immediately without rebuilding the image. However, bind mounts can introduce platform-specific path issues and permission problems.

# Named volumes (managed by Docker, portable)
docker volume create pgdata
docker run -d --name db -v pgdata:/var/lib/postgresql/data postgres:16-alpine
 
# Bind mounts (maps host directory to container)
docker run -d --name dev-app \
  -v $(pwd)/src:/app/src \
  -v /app/node_modules \
  node:20-alpine
 
# List volumes
docker volume ls
 
# Inspect a volume
docker volume inspect pgdata

Step-by-Step Implementation

Project Structure

Let us build a practical application: a Node.js API backed by PostgreSQL, with Redis for caching. This demonstrates real-world Docker usage with multiple services, networking, and persistent data.

my-docker-app/
├── src/
│   └── index.js
├── package.json
├── Dockerfile
├── .dockerignore
└── docker-compose.yml
// src/index.js - Simple Express API with PostgreSQL and Redis
const express = require('express');
const { Pool } = require('pg');
const Redis = require('ioredis');
 
const app = express();
const port = process.env.PORT || 3000;
 
// Database connection
const pool = new Pool({
  host: process.env.DB_HOST || 'localhost',
  port: process.env.DB_PORT || 5432,
  database: process.env.DB_NAME || 'myapp',
  user: process.env.DB_USER || 'postgres',
  password: process.env.DB_PASSWORD || 'password',
});
 
// Redis connection
const redis = new Redis({
  host: process.env.REDIS_HOST || 'localhost',
  port: process.env.REDIS_PORT || 6379,
});
 
app.get('/health', async (req, res) => {
  try {
    await pool.query('SELECT 1');
    await redis.ping();
    res.json({ status: 'healthy', timestamp: new Date().toISOString() });
  } catch (error) {
    res.status(503).json({ status: 'unhealthy', error: error.message });
  }
});
 
app.get('/api/visits', async (req, res) => {
  const count = await redis.incr('visit_count');
  res.json({ visits: count });
});
 
app.listen(port, () => {
  console.log(`Server running on port ${port}`);
});
{
  "name": "my-docker-app",
  "version": "1.0.0",
  "dependencies": {
    "express": "^4.18.2",
    "pg": "^8.11.3",
    "ioredis": "^5.3.2"
  }
}

Writing the Dockerfile

The Dockerfile defines how your application image is built. Each instruction creates a layer, and the order of instructions affects build cache efficiency.

# Dockerfile - Multi-stage build for Node.js application
FROM node:20-alpine AS base
WORKDIR /app
 
# Install dependencies (cached unless package.json changes)
FROM base AS dependencies
COPY package*.json ./
RUN npm ci --only=production && \
    cp -R node_modules /production_modules && \
    npm ci
 
# Build stage
FROM dependencies AS build
COPY . .
RUN npm run build 2>/dev/null || echo "No build step defined"
 
# Production image
FROM base AS production
ENV NODE_ENV=production
 
# Copy production dependencies and application code
COPY --from=dependencies /production_modules ./node_modules
COPY --from=build /app/src ./src
COPY --from=build /app/package.json ./
 
# Create non-root user for security
RUN addgroup -g 1001 -S appgroup && \
    adduser -S appuser -u 1001 -G appgroup && \
    chown -R appuser:appgroup /app
 
USER appuser
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1
 
CMD ["node", "src/index.js"]
# .dockerignore - Exclude files from build context
node_modules
npm-debug.log
.git
.gitignore
.env
.dockerignore
Dockerfile
docker-compose*.yml
README.md

Docker Compose for Multi-Container Applications

Docker Compose defines and runs multi-container applications using a YAML file. It handles networking, volume management, and service orchestration, making it the standard tool for local development environments.

# docker-compose.yml
version: '3.8'
 
services:
  app:
    build:
      context: .
      target: production
    ports:
      - "3000:3000"
    environment:
      - NODE_ENV=production
      - DB_HOST=postgres
      - DB_PORT=5432
      - DB_NAME=myapp
      - DB_USER=postgres
      - DB_PASSWORD=password
      - REDIS_HOST=redis
      - REDIS_PORT=6379
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
    restart: unless-stopped
 
  postgres:
    image: postgres:16-alpine
    volumes:
      - pgdata:/var/lib/postgresql/data
    environment:
      - POSTGRES_DB=myapp
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=password
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5
    restart: unless-stopped
 
  redis:
    image: redis:7-alpine
    volumes:
      - redisdata:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5
    restart: unless-stopped
 
volumes:
  pgdata:
  redisdata:
# Build and start all services
docker compose up -d --build
 
# View running services
docker compose ps
 
# View logs
docker compose logs -f app
 
# Stop all services
docker compose down
 
# Stop and remove volumes (deletes data)
docker compose down -v

Real-World Use Cases and Case Studies

Use Case 1: Development Environment Standardization

A team of eight developers working on a microservices application spent an average of two days setting up their local development environment. Each developer had different OS versions, database installations, and dependency configurations. By containerizing the development environment with Docker Compose, they reduced onboarding time to 30 minutes. New developers clone the repository and run docker compose up, and the entire application stack starts with consistent versions of PostgreSQL, Redis, Elasticsearch, and the application services.

Use Case 2: CI/CD Pipeline Consistency

A DevOps team maintained separate Jenkins agents with different tool versions, causing builds to succeed on one agent and fail on another. By running builds inside Docker containers, they ensured that the build environment was identical regardless of which agent picked up the job. Build failures due to environment differences dropped to zero, and the team could test against multiple Node.js versions in parallel by simply running different containers.

Use Case 3: Legacy Application Modernization

A company maintaining a PHP 5.6 application needed to modernize their deployment without rewriting the application. They created a Docker image that precisely replicated the production server configuration: specific PHP version, Apache modules, and system libraries. This containerized version ran on modern infrastructure, giving them time to plan a proper migration while eliminating the aging physical servers.

Use Case 4: Database Development and Testing

A data engineering team needed isolated PostgreSQL instances for feature development and testing. Using Docker, each developer could spin up a database with production-like data in seconds. Test suites ran against fresh database containers, eliminating test pollution and flaky tests caused by shared database state. They created a script that provisions a database with sample data:

#!/bin/bash
# scripts/dev-db.sh - Provision a development database
docker run -d \
  --name dev-postgres \
  -e POSTGRES_DB=testdb \
  -e POSTGRES_USER=dev \
  -e POSTGRES_PASSWORD=devpass \
  -p 5432:5432 \
  -v ./init-scripts:/docker-entrypoint-initdb.d \
  postgres:16-alpine
 
echo "Database ready at postgresql://dev:devpass@localhost:5432/testdb"

Best Practices for Production

  1. Use specific image tags: Never rely on latest in production. Pin to exact versions like node:20.10.0-alpine3.18 to ensure reproducible builds. The latest tag can change at any time, introducing unexpected changes.

  2. Run as non-root user: Always create and switch to a non-root user in your Dockerfile. Most applications do not require root privileges, and running as root inside a container is a security risk if the container is compromised.

  3. Use multi-stage builds: Separate build dependencies from runtime dependencies. A Node.js application does not need npm or build tools in the production image. Multi-stage builds can reduce image sizes by 60-80%.

  4. Implement health checks: Define HEALTHCHECK instructions so orchestrators can detect and restart unhealthy containers. Without health checks, a container running a crashed application appears healthy to Docker.

  5. Minimize image layers: Combine related RUN commands to reduce the number of layers. Each layer adds metadata overhead, and fewer layers mean faster image pulls and less attack surface.

  6. Use .dockerignore: Exclude unnecessary files from the build context. Sending node_modules, .git, and test files to the Docker daemon wastes time and can leak sensitive information into images.

  7. Set resource limits: Define memory and CPU limits in production to prevent containers from consuming all host resources. A single runaway process should not take down the entire host.

  8. Manage secrets properly: Never hardcode passwords, API keys, or certificates in Dockerfiles or images. Use Docker secrets, environment variables from a secure source, or mounted secret files.

Common Pitfalls and Solutions

PitfallImpactSolution
Using latest tag in productionBuilds become non-reproducible; unexpected version changesPin to specific version tags: node:20.10.0-alpine
Copying node_modules into containerConflicts between host and container OS; bloated imagesUse .dockerignore and let npm install run inside the container
Running as root inside containerContainer escape gives attacker root on hostAdd USER instruction after file setup in Dockerfile
Not using health checksOrchestrators cannot detect application crashesAdd HEALTHCHECK instruction with a real endpoint check
Storing data inside container filesystemData lost when container is removedUse named volumes for persistent data
Hardcoding environment-specific valuesSame image cannot work across dev/staging/productionUse environment variables and runtime configuration
Building from context root without .dockerignore.git, node_modules, and secrets included in imageCreate comprehensive .dockerignore file
Not cleaning up unused resourcesDisk space exhaustion over timeRun docker system prune regularly; use --rm for one-off containers

Performance Optimization

Container performance optimization starts with image size and build time, then extends to runtime resource management and monitoring.

# Optimized Dockerfile for minimal image size
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
 
FROM gcr.io/distroless/nodejs20-debian12
WORKDIR /app
COPY --from=builder /app/node_modules ./node_modules
COPY src/ ./src/
CMD ["src/index.js"]

Image size comparison for a typical Node.js application:

  • Full Debian-based image: ~900MB
  • Alpine-based image: ~150MB
  • Distroless image: ~120MB
  • Alpine multi-stage with production-only deps: ~80MB
# docker-compose.yml with resource limits
services:
  app:
    build: .
    deploy:
      resources:
        limits:
          cpus: '0.5'
          memory: 512M
        reservations:
          cpus: '0.25'
          memory: 256M
    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "3"

Comparison with Alternatives

FeatureDocker ContainersVirtual MachinesBare Metal
Startup timeSecondsMinutesN/A (physical)
Resource overheadMinimal (shared kernel)Significant (full OS)None
IsolationProcess-levelHardware-levelPhysical
PortabilityHigh (any Docker host)Moderate (hypervisor-dependent)Low (hardware-specific)
Density100s per host10s per host1 application per server
Image sizeMB rangeGB rangeN/A
Security boundaryNamespaces + cgroupsHardware virtualizationPhysical separation
Best forMicroservices, CI/CD, dev environmentsLegacy apps, strong isolation needsPerformance-critical workloads

Advanced Patterns and Techniques

Development with Hot Reload

Mounting source code into a container with a file watcher enables instant feedback during development without rebuilding the image.

# docker-compose.dev.yml - Development overrides
version: '3.8'
 
services:
  app:
    build:
      context: .
      target: dependencies
    volumes:
      - ./src:/app/src
      - /app/node_modules
    environment:
      - NODE_ENV=development
    command: npx nodemon src/index.js
    ports:
      - "3000:3000"
      - "9229:9229"  # Node.js debugger
# Start in development mode
docker compose -f docker-compose.yml -f docker-compose.dev.yml up

Container Debugging Techniques

# Execute a shell inside a running container
docker exec -it my-app /bin/sh
 
# View container resource usage in real time
docker stats my-app
 
# Inspect container configuration
docker inspect my-app
 
# Copy files from container to host
docker cp my-app:/app/logs/app.log ./app.log
 
# View container filesystem changes
docker diff my-app

Testing Strategies

Containerized applications require testing at multiple levels: unit tests inside the container, integration tests across containers, and end-to-end tests against the running application stack.

# docker-compose.test.yml - Testing configuration
version: '3.8'
 
services:
  test-db:
    image: postgres:16-alpine
    environment:
      - POSTGRES_DB=testdb
      - POSTGRES_USER=test
      - POSTGRES_PASSWORD=test
    tmpfs:
      - /var/lib/postgresql/data  # RAM-backed for speed
 
  test-redis:
    image: redis:7-alpine
    tmpfs:
      - /data
 
  tests:
    build:
      context: .
      target: dependencies
    environment:
      - NODE_ENV=test
      - DB_HOST=test-db
      - REDIS_HOST=test-redis
    depends_on:
      - test-db
      - test-redis
    command: npm test
# Run tests in isolated containers
docker compose -f docker-compose.test.yml up --build --abort-on-container-exit
docker compose -f docker-compose.test.yml down -v

Future Outlook

Docker continues to evolve with improvements to build performance (BuildKit), security (rootless mode), and platform support. The rise of WebAssembly (Wasm) as a container runtime target presents an interesting complement to traditional Linux containers. Docker Desktop now supports Wasm containers alongside traditional Linux containers, enabling lighter-weight workloads that start in milliseconds.

The container ecosystem is also seeing convergence around Kubernetes as the orchestration standard. Docker Compose remains the tool of choice for local development and simple deployments, while Kubernetes handles production orchestration. Tools like Docker Desktop bridge this gap by providing Kubernetes support alongside the familiar Compose workflow.

Conclusion

Docker transforms how teams develop, test, and deploy applications by providing consistent environments from laptop to production. The key concepts you have learned in this guide form the foundation for all container-based workflows:

  1. Images and containers are the core abstractions. Images are blueprints; containers are running instances.
  2. Dockerfiles define how images are built. Layer ordering affects build cache efficiency.
  3. Docker Compose orchestrates multi-container applications. Use it for local development and simple production deployments.
  4. Volumes persist data beyond container lifecycles. Never store important data in a container's writable layer.
  5. Networking enables container communication. User-defined networks provide DNS resolution by service name.

Start by containerizing a simple application, then progressively add databases, caches, and other services. The investment in learning Docker pays dividends across every stage of the software development lifecycle, from the first line of code to production monitoring.