Ansible for DevOps: Configuration Management

Introduction

Configuration management is the backbone of reliable infrastructure operations. When you need to ensure that dozens, hundreds, or thousands of servers are configured identically — with the same packages, users, services, and security settings — manual configuration is impossible and scripts are fragile. Ansible solves this with a declarative, agentless approach to infrastructure automation. You describe the desired state of your systems, and Ansible makes it so — idempotently, meaning it only makes changes when necessary and produces the same result regardless of how many times it runs.

Ansible's key differentiator is its simplicity. Unlike Puppet or Chef, which require agents installed on every managed node, Ansible is agentless — it connects via SSH (or WinRM for Windows) and executes tasks remotely. There's no central server to maintain, no agents to update, and no certificates to manage. This makes Ansible the easiest configuration management tool to adopt and the most popular choice for teams getting started with infrastructure automation.

The declarative model is what makes Ansible powerful. Instead of writing imperative scripts ("do this, then this, then this"), you define the desired state ("this package should be installed, this service should be running, this file should have these contents"). Ansible compares the desired state to the current state and makes only the necessary changes. This idempotency means you can run the same playbook repeatedly without side effects — a critical property for reliable automation.

Understanding Ansible: Core Concepts

Inventory

The inventory defines the hosts Ansible manages. It can be a simple static file listing hostnames, or a dynamic script that queries your cloud provider for current instances. Hosts can be organized into groups, and variables can be assigned at the host or group level.

A static inventory file in INI format looks like:

[webservers]
web1.example.com http_port=80
web2.example.com http_port=80
 
[databases]
db1.example.com postgres_version=15
db2.example.com postgres_version=15
 
[monitoring]
mon1.example.com
 
[all:vars]
ansible_user=deploy
ansible_ssh_private_key_file=~/.ssh/deploy_key

For cloud environments, dynamic inventory scripts query AWS EC2, GCP, or Azure for current instances automatically. This eliminates the need to maintain static inventory files as infrastructure scales up and down:

#!/usr/bin/env python3
# inventory/aws_ec2.py — Dynamic inventory for AWS EC2
import boto3
import json
 
def get_inventory():
    ec2 = boto3.client('ec2', region_name='us-east-1')
    instances = ec2.describe_instances(
        Filters=[{'Name': 'instance-state-name', 'Values': ['running']}]
    )
 
    inventory = {'_meta': {'hostvars': {}}, 'all': {'children': []}}
 
    for reservation in instances['Reservations']:
        for instance in reservation['Instances']:
            name = next(
                (t['Value'] for t in instance.get('Tags', []) if t['Key'] == 'Name'),
                instance['InstanceId']
            )
            inventory['_meta']['hostvars'][name] = {
                'ansible_host': instance['PublicIpAddress'],
                'instance_type': instance['InstanceType'],
                'private_ip': instance['PrivateIpAddress'],
            }
 
    return json.dumps(inventory, indent=2)
 
if __name__ == '__main__':
    print(get_inventory())

Playbooks

Playbooks are Ansible's configuration files, written in YAML. They define a list of plays, each targeting a set of hosts and executing a sequence of tasks. Playbooks are the primary unit of automation in Ansible — they're version-controlled, testable, and reusable.

A playbook consists of one or more plays, each of which maps a set of hosts to a set of tasks. Within a play, you can define variables, include roles, specify handlers, and apply tags for selective execution.

Tasks and Modules

Tasks are the individual steps within a play. Each task calls an Ansible module — a reusable unit of code that performs a specific action (install a package, copy a file, start a service, manage users). Ansible ships with over 3,000 modules for every common system administration task across Linux, Windows, network devices, and cloud platforms.

Common module categories include: package management (apt, yum, dnf, pip), file management (copy, template, file, lineinfile), service management (service, systemd), user management (user, group, authorized_key), networking (uri, get_url, git), and cloud (ec2_instance, gcp_compute_instance, azure_rm_virtualmachine).

Roles

Roles are reusable collections of tasks, variables, files, templates, and handlers. They enable you to decompose complex playbooks into modular, shareable components. A web server role might include tasks for installing nginx, configuring virtual hosts, setting up SSL, and starting the service.

Roles follow a standardized directory structure:

roles/nginx/
├── defaults/main.yml      # Default variables (lowest priority)
├── vars/main.yml          # Role variables (higher priority)
├── tasks/main.yml         # Task definitions
├── handlers/main.yml      # Handler definitions
├── templates/             # Jinja2 templates
│   ├── nginx.conf.j2
│   └── vhost.conf.j2
├── files/                 # Static files to copy
├── meta/main.yml          # Role metadata and dependencies
└── tests/                 # Molecule test files
    └── test.yml

Handlers

Handlers are tasks that only run when notified by other tasks. They're used for actions that should only happen when something changes — like restarting a service after a configuration file is modified. This ensures services are only restarted when necessary, not on every playbook run.

Handlers are defined at the play level and notified by tasks using the notify directive. Multiple tasks can notify the same handler, and the handler runs only once at the end of the play — even if notified by multiple tasks.

Variables and Templating

Ansible uses Jinja2 templating for dynamic content. Variables can be defined at multiple levels with a strict precedence order: role defaults → inventory vars → playbook vars → extra vars (command-line -e). Understanding this precedence is critical for managing configuration across environments.

# group_vars/production.yml
app_environment: production
app_debug: false
app_log_level: warning
database_host: prod-db.example.com
database_pool_size: 20
 
# group_vars/staging.yml
app_environment: staging
app_debug: true
app_log_level: debug
database_host: staging-db.example.com
database_pool_size: 5

Jinja2 templates enable dynamic configuration files:

{# templates/nginx.conf.j2 #}
worker_processes {{ nginx_worker_processes }};
worker_connections {{ nginx_worker_connections }};
 
events {
    worker_connections {{ nginx_worker_connections }};
}
 
http {
    upstream app {
        {% for host in groups['webservers'] %}
        server {{ hostvars[host]['ansible_host'] }}:{{ http_port }};
        {% endfor %}
    }
 
    server {
        listen {{ https_port }} ssl;
        server_name {{ domain_name }};
 
        ssl_certificate {{ ssl_certificate_path }};
        ssl_certificate_key {{ ssl_certificate_key_path }};
 
        location / {
            proxy_pass http://app;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
        }
    }
}

Architecture and Design Patterns

The Layered Playbook Pattern

Organize playbooks into layers: base configuration (users, SSH, security), platform configuration (packages, services), application configuration (app-specific files, deployments). Each layer is independent and can be applied separately or combined.

A site-level playbook orchestrates the layers:

# site.yml — Top-level playbook
---
- import_playbook: playbooks/base.yml
- import_playbook: playbooks/security.yml
- import_playbook: playbooks/webservers.yml
- import_playbook: playbooks/databases.yml
- import_playbook: playbooks/monitoring.yml

The Role-Based Pattern

Define roles for each type of server (web server, database server, monitoring server) and compose playbooks by including the appropriate roles. This maximizes reusability and keeps playbooks simple.

The Environment Pattern

Use Ansible's variable system to handle differences between environments (dev, staging, production). Define environment-specific variables in group variable files and use the same playbooks across all environments.

The Pull Pattern

Instead of pushing configurations from a central server, use ansible-pull on each node to pull and apply configurations from a Git repository. This scales better for large deployments and enables self-healing infrastructure — each node periodically pulls the latest configuration and corrects any drift.

# Cron job on each managed node
*/15 * * * * ansible-pull -U https://github.com/org/ansible-config.git -i localhost site.yml

Step-by-Step Implementation

Basic Inventory and Playbook

# inventory/hosts.yml
all:
  children:
    webservers:
      hosts:
        web1.example.com:
        web2.example.com:
        web3.example.com:
      vars:
        http_port: 80
        https_port: 443
 
    databases:
      hosts:
        db1.example.com:
        db2.example.com:
      vars:
        postgres_version: 15
 
    monitoring:
      hosts:
        mon1.example.com:

# playbooks/webserver.yml
---
- name: Configure web servers
  hosts: webservers
  become: true
 
  vars:
    nginx_worker_processes: auto
    nginx_worker_connections: 1024
 
  roles:
    - common
    - nginx
    - ssl
    - monitoring
 
  tasks:
    - name: Install required packages
      apt:
        name:
          - nginx
          - certbot
          - python3-certbot-nginx
          - htop
          - vim
        state: present
        update_cache: yes
      tags: packages
 
    - name: Copy nginx configuration
      template:
        src: templates/nginx.conf.j2
        dest: /etc/nginx/nginx.conf
        owner: root
        group: root
        mode: '0644'
      notify: Restart nginx
      tags: config
 
    - name: Ensure nginx is running
      service:
        name: nginx
        state: started
        enabled: yes
      tags: service
 
    - name: Configure firewall
      ufw:
        rule: allow
        port: "{{ item }}"
        proto: tcp
      loop:
        - "{{ http_port }}"
        - "{{ https_port }}"
        - 22
      tags: security
 
  handlers:
    - name: Restart nginx
      service:
        name: nginx
        state: restarted

Creating Reusable Roles

# roles/nginx/tasks/main.yml
---
- name: Install nginx
  apt:
    name: nginx
    state: present
  notify: Restart nginx
 
- name: Create nginx configuration directory
  file:
    path: /etc/nginx/conf.d
    state: directory
    mode: '0755'
 
- name: Deploy nginx configuration
  template:
    src: nginx.conf.j2
    dest: /etc/nginx/nginx.conf
    validate: nginx -t -c %s
  notify: Restart nginx
 
- name: Deploy site configurations
  template:
    src: "{{ item }}.conf.j2"
    dest: "/etc/nginx/conf.d/{{ item }}.conf"
  loop: "{{ nginx_sites }}"
  notify: Reload nginx
 
- name: Ensure nginx is started and enabled
  service:
    name: nginx
    state: started
    enabled: yes

# roles/nginx/handlers/main.yml
---
- name: Restart nginx
  service:
    name: nginx
    state: restarted
 
- name: Reload nginx
  service:
    name: nginx
    state: reloaded

# roles/nginx/defaults/main.yml
---
nginx_worker_processes: auto
nginx_worker_connections: 1024
nginx_sites: []
nginx_ssl_certificate: ""
nginx_ssl_certificate_key: ""

Database Configuration Role

# roles/postgresql/tasks/main.yml
---
- name: Install PostgreSQL
  apt:
    name:
      - postgresql
      - postgresql-contrib
      - python3-psycopg2
    state: present
 
- name: Ensure PostgreSQL is running
  service:
    name: postgresql
    state: started
    enabled: yes
 
- name: Create application database
  postgresql_db:
    name: "{{ app_database_name }}"
    encoding: UTF-8
    lc_collate: en_US.UTF-8
    state: present
  become: true
  become_user: postgres
 
- name: Create application user
  postgresql_user:
    name: "{{ app_database_user }}"
    password: "{{ app_database_password }}"
    db: "{{ app_database_name }}"
    priv: ALL
    state: present
  become: true
  become_user: postgres
 
- name: Configure pg_hba.conf for application access
  template:
    src: pg_hba.conf.j2
    dest: /etc/postgresql/{{ postgres_version }}/main/pg_hba.conf
    owner: postgres
    group: postgres
    mode: '0640'
  notify: Restart postgresql
 
- name: Tune PostgreSQL for production
  template:
    src: postgresql.conf.j2
    dest: /etc/postgresql/{{ postgres_version }}/main/postgresql.conf
    owner: postgres
    group: postgres
    mode: '0644'
  notify: Restart postgresql

Ansible Vault for Secrets Management

Ansible Vault encrypts sensitive data so it can be safely stored in version control:

# Create an encrypted secrets file
ansible-vault create group_vars/production/vault.yml
 
# Edit an encrypted file
ansible-vault edit group_vars/production/vault.yml
 
# Run a playbook with encrypted secrets
ansible-playbook site.yml --ask-vault-pass
 
# Or use a vault password file (for CI/CD)
ansible-playbook site.yml --vault-password-file ~/.vault_pass

The encrypted file contains sensitive variables:

# group_vars/production/vault.yml (encrypted)
vault_database_password: "super-secret-password"
vault_api_key: "sk-proj-abc123..."
vault_ssl_private_key: |
  -----BEGIN RSA PRIVATE KEY-----
  ...
  -----END RSA PRIVATE KEY-----

Reference vault variables in regular variable files:

# group_vars/production/vars.yml
database_password: "{{ vault_database_password }}"
api_key: "{{ vault_api_key }}"

Real-World Use Cases

Server Hardening and Compliance

Automate security hardening across your fleet: disable password authentication, configure firewalls, set up automatic security updates, enforce password policies, and audit system configurations against CIS benchmarks. Run the hardening playbook on every new server to ensure consistent security posture.

A hardening playbook might include: disabling root SSH login, setting up fail2ban, configuring UFW/iptables, enabling automatic security updates, setting file permissions, disabling unused services, and configuring audit logging.

Application Deployment

Deploy applications consistently across environments using Ansible. Pull the latest code from Git, install dependencies, run database migrations, update configuration files, and restart services — all in a single, idempotent playbook.

A typical deployment playbook includes tasks for: pulling code from Git, creating a virtual environment, installing Python dependencies, running database migrations with Alembic, updating Nginx configuration, and restarting the application service.

Infrastructure Provisioning

Combine Ansible with cloud provider modules to provision infrastructure: create VMs, configure networking, set up load balancers, and configure DNS. This infrastructure-as-code approach enables reproducible environments and disaster recovery.

- name: Create EC2 instance
  amazon.aws.ec2_instance:
    name: "{{ instance_name }}"
    instance_type: "{{ instance_type }}"
    image_id: "{{ ami_id }}"
    key_name: "{{ ssh_key_name }}"
    vpc_subnet_id: "{{ subnet_id }}"
    security_groups:
      - "{{ security_group }}"
    tags:
      Environment: "{{ app_environment }}"
      Project: "{{ project_name }}"
    state: present
  register: ec2_result

Compliance Auditing

Write playbooks that audit server configurations against compliance standards (PCI-DSS, HIPAA, SOC2). The playbook checks each requirement and generates a compliance report, identifying non-compliant servers and the specific issues.

- name: Audit SSH configuration
  lineinfile:
    path: /etc/ssh/sshd_config
    regexp: "{{ item.regexp }}"
    line: "{{ item.line }}"
    state: present
    validate: '/usr/sbin/sshd -t -f %s'
  loop:
    - { regexp: '^#?PermitRootLogin', line: 'PermitRootLogin no' }
    - { regexp: '^#?PasswordAuthentication', line: 'PasswordAuthentication no' }
    - { regexp: '^#?X11Forwarding', line: 'X11Forwarding no' }
    - { regexp: '^#?MaxAuthTries', line: 'MaxAuthTries 3' }
  notify: Restart sshd

Best Practices for Production

Use version control for everything — Store playbooks, roles, inventory, and variable files in Git. This enables collaboration, change tracking, and rollback.
Make playbooks idempotent — Every task should produce the same result whether run once or ten times. Use Ansible's built-in modules rather than shell commands whenever possible.
Use roles for reusability — Don't repeat task sequences across playbooks. Extract common patterns into roles and include them where needed.
Separate configuration from secrets — Use Ansible Vault to encrypt sensitive data (passwords, API keys, certificates). Store encrypted files in version control alongside playbooks.
Tag tasks for selective execution — Add tags to tasks so you can run specific subsets of a playbook: ansible-playbook site.yml --tags "config,security".
Test with Molecule — Use Molecule to test Ansible roles in isolated Docker containers or VMs. This catches errors before deploying to production.
Use handlers for service restarts — Don't restart services in tasks. Use handlers that only fire when configuration actually changes.
Document with comments — Add YAML comments explaining why, not just what. Document variable meanings, task purposes, and non-obvious dependencies.
Limit privilege escalation — Use become: true only on tasks that require root. Don't run entire plays as root when only specific tasks need it.
Use ansible-lint for linting — Run ansible-lint on your playbooks and roles to catch common errors, enforce style guidelines, and ensure best practices.

Common Pitfalls and Solutions

Pitfall	Impact	Solution
Using shell/command modules excessively	Non-idempotent, hard to maintain	Use purpose-built Ansible modules
Hardcoded values in playbooks	Not reusable across environments	Use variables and group/host vars
No version control	Lost changes, no rollback	Store everything in Git
Ignoring idempotency	Unexpected side effects on re-run	Test playbooks by running them multiple times
Storing secrets in plain text	Security breach	Use Ansible Vault for secrets
No testing	Bugs discovered in production	Use Molecule for role testing
Running as root unnecessarily	Security risk	Use become: true only for specific tasks
Ignoring `ansible-lint`	Style violations, potential bugs	Integrate linting into CI/CD
Using `when` with `command` modules	Tasks still run, just skipped	Use `creates`/`removes` parameters
Not using `check_mode`	No dry-run capability	Test with `--check` before applying

Debugging Ansible Playbooks

Use --check mode to preview changes without applying them. Use --diff to see exactly what will change. Use -v, -vv, or -vvv for increasing verbosity. Use --step to confirm each task before execution.

# Dry run with diff output
ansible-playbook site.yml --check --diff
 
# Verbose execution
ansible-playbook site.yml -vvv
 
# Step through tasks interactively
ansible-playbook site.yml --step
 
# Run only on a specific host
ansible-playbook site.yml --limit web1.example.com
 
# Run only tagged tasks
ansible-playbook site.yml --tags "config,security"
 
# Start at a specific task
ansible-playbook site.yml --start-at-task "Deploy nginx configuration"

Performance Optimization

Optimize Ansible performance by enabling pipelining (reduces SSH connections per task), using Mitogen (a faster execution strategy), parallelizing play execution across hosts, and using strategy: free to let hosts proceed independently.

# ansible.cfg — Performance optimizations
[defaults]
forks = 20                    # Parallel execution across 20 hosts
pipelining = True             # Reduce SSH connections per task
strategy_plugins = mitogen_linear  # Use Mitogen strategy (optional)
 
[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=60s
control_path = %(directory)s/ansible-ssh-%%h-%%p-%%r

For large deployments (1000+ hosts), use ansible-pull instead of push mode, implement rolling updates to avoid overwhelming the control node, and use dynamic inventory to avoid maintaining large static files.

Comparison with Alternatives

Feature	Ansible	Puppet	Chef	Terraform
Agentless	✓	✗	✗	✓
Language	YAML	Puppet DSL	Ruby	HCL
Learning Curve	Low	Medium	High	Medium
Idempotency	Built-in	Built-in	Manual	Built-in
Configuration Mgmt	★★★★★	★★★★★	★★★★★	★★
Provisioning	★★★	★★	★★	★★★★★
Best For	Config mgmt, automation	Large enterprises	Complex infra	Infrastructure provisioning

Ansible and Terraform are complementary — Terraform provisions infrastructure (creating VMs, networks, load balancers), while Ansible configures the provisioned servers (installing software, deploying applications, managing services). Many teams use both: Terraform for infrastructure provisioning and Ansible for configuration management.

Advanced Patterns

Dynamic Inventory

Use dynamic inventory scripts to automatically discover and manage cloud resources. Ansible queries AWS, GCP, or Azure for current instances and manages them without maintaining static inventory files.

Ansible Collections

Collections bundle roles, modules, plugins, and playbooks into distributable packages. Create organization-specific collections for shared automation patterns and distribute them via Ansible Galaxy or private registries.

# requirements.yml — Install collections from Galaxy
collections:
  - name: community.general
    version: ">=5.0.0"
  - name: amazon.aws
    version: ">=5.0.0"
  - name: community.postgresql
    version: ">=2.0.0"

Ansible Tower / AWX

AWX (open-source) and Ansible Tower (commercial) provide a web UI, RBAC, scheduling, logging, and API for Ansible automation. They enable team collaboration and enterprise-grade workflow management.

Molecule Testing

Test Ansible roles in isolated environments using Molecule:

# molecule/default/molecule.yml
dependency:
  name: galaxy
driver:
  name: docker
platforms:
  - name: ubuntu-2204
    image: geerlingguy/docker-ubuntu2204-ansible
    privileged: true
    volumes:
      - /sys/fs/cgroup:/sys/fs/cgroup:rw
provisioner:
  name: ansible
verifier:
  name: ansible

Future Outlook

Ansible is evolving toward event-driven automation — responding to infrastructure events (alerts, deployments, security incidents) automatically. The Ansible Lightspeed project brings AI-assisted playbook generation, helping users write playbooks from natural language descriptions.

The convergence of Ansible with GitOps practices — where Git repositories are the single source of truth for infrastructure state — is creating more reliable, auditable automation workflows. Changes to infrastructure are made through pull requests, reviewed by humans, and applied automatically by Ansible.

Conclusion

Ansible is the most accessible and widely adopted configuration management tool for good reason. Its agentless architecture, YAML-based playbooks, and idempotent execution model make infrastructure automation reliable, maintainable, and approachable for teams of all sizes.

Key takeaways:

Ansible is agentless — it connects via SSH, requiring no software installation on managed nodes
Playbooks define desired state declaratively, and Ansible makes only necessary changes
Roles enable reusable, modular automation components
Use Ansible Vault for secrets management — never store credentials in plain text
Test playbooks with Molecule before deploying to production
Use handlers for service restarts — only restart when configuration actually changes
Store all automation code in version control for collaboration and rollback

Start by writing a simple playbook that configures a single server — install packages, create users, configure a service. Run it multiple times to observe idempotency. Then expand to multiple servers and add roles for reusability. The investment in learning Ansible pays dividends in reduced manual effort, consistent configurations, and reliable deployments.

Minh Vo

Slaying code & making it lit fr fr 🔥 tagline