Feature Flags: Implementing Progressive Delivery

Introduction

Feature flags (also known as feature toggles) are one of the most powerful techniques in modern software development. They decouple deployment from release, allowing you to deploy code to production without making it visible to users. This enables progressive delivery—gradually rolling out features to subsets of users, testing in production with real traffic, and instantly disabling problematic features without deploying new code. In this comprehensive guide, we will explore feature flag implementation patterns, targeting strategies, A/B testing integration, and best practices for managing flags at scale.

The evolution from continuous deployment to continuous delivery has made feature flags essential infrastructure. Without flags, every deployment is a binary event: the feature is either live for everyone or not deployed at all. Feature flags transform this into a spectrum: you can enable a feature for internal testers, then 5% of users, then 50%, then everyone—each step validated by real production data. If anything goes wrong, a single flag flip disables the feature instantly, without a rollback deployment.

Understanding Feature Flags: Core Concepts

A feature flag is a conditional branch in your code that checks whether a feature should be enabled or disabled. At its simplest, a flag is a boolean: if (isEnabled('new-checkout-flow')) { showNewCheckout() }. But production feature flags are much more sophisticated, supporting targeting rules, percentage rollouts, user segmentation, and multi-variant experiments.

There are several categories of feature flags. Release flags control the visibility of features that are in development. They are short-lived and removed after the feature is fully released. Experiment flags power A/B tests by randomly assigning users to control or treatment groups. They live for the duration of the experiment. Ops flags control operational aspects like enabling maintenance mode or switching between database backends. They are long-lived and require careful management. Permission flags control access to premium features based on user subscription or role. They are permanent.

Flag evaluation can happen on the server side or the client side. Server-side evaluation is more secure because flag rules and targeting data never leave the server. Client-side evaluation reduces latency because flags are evaluated locally, but requires sending flag configurations to the client. Most production systems use a hybrid approach: server-side evaluation for sensitive flags and client-side evaluation for UI flags.

The flag lifecycle is a critical concept. Flags should have a clear lifecycle: created, in development, in testing, rolled out, fully released, and archived. Stale flags—those that are no longer needed—increase code complexity and should be removed. Implementing a flag review process ensures flags are cleaned up after their purpose is served.

Targeting rules determine which users see a feature. Rules can be based on user attributes (country, subscription plan, account age), percentage rollout (random 10% of users), individual user IDs (beta testers), or custom attributes (users with more than 100 orders). Combining rules with AND/OR logic enables sophisticated targeting strategies.

Architecture and Design Patterns

Client-Side Evaluation Pattern

Flags are evaluated in the client application using cached flag configurations. The client downloads flag rules at startup and periodically refreshes. This pattern provides low-latency evaluation but exposes flag logic to the client.

Server-Side Evaluation Pattern

The server evaluates flags for each request and includes the results in the response. This keeps flag logic secure but adds latency for each evaluation. Server-side evaluation is preferred for sensitive flags that should not be exposed to clients.

Edge Evaluation Pattern

Flags are evaluated at the CDN edge using edge workers. This combines the security of server-side evaluation with the low latency of edge computing. Cloudflare Workers, Vercel Edge Functions, and similar platforms enable this pattern.

Flag-Driven Configuration Pattern

Feature flags can control not just feature visibility but also configuration values. A flag can return a number (percentage of traffic to route to a new service), a string (which variant of an A/B test to show), or a JSON object (complete configuration for a feature).

Gradual Rollout Pattern

Instead of enabling a feature for all users at once, gradually increase the percentage of users who see it. Start with internal testers (0.1%), then 1%, then 5%, then 25%, then 100%. Monitor error rates and performance metrics at each stage before proceeding.

Step-by-Step Implementation

Let us build a complete feature flag system with server-side evaluation, targeting rules, percentage rollouts, and A/B testing support.

First, define the flag data model:

interface FeatureFlag {
  key: string;
  name: string;
  description: string;
  enabled: boolean;
  defaultValue: boolean;
  variants?: Record<string, number>; // variant name -> weight
  targetingRules: TargetingRule[];
  percentageRollout?: number; // 0-100
  createdAt: Date;
  updatedAt: Date;
  archivedAt?: Date;
}
 
interface TargetingRule {
  attribute: string;
  operator: 'equals' | 'not_equals' | 'contains' | 'gt' | 'lt' | 'in' | 'not_in';
  value: string | number | string[] | number[];
  variant?: string; // Which variant to assign if rule matches
}
 
interface UserContext {
  userId: string;
  email?: string;
  country?: string;
  subscription?: string;
  accountAge?: number;
  customAttributes?: Record<string, unknown>;
}
 
interface EvaluationResult {
  enabled: boolean;
  variant?: string;
  reason: 'default' | 'targeting' | 'percentage' | 'override';
}

Implement the flag evaluator with targeting and rollout:

class FeatureFlagEvaluator {
  private flags: Map<string, FeatureFlag> = new Map();
  private overrides: Map<string, Map<string, EvaluationResult>> = new Map();
 
  constructor(flags: FeatureFlag[]) {
    for (const flag of flags) {
      this.flags.set(flag.key, flag);
    }
  }
 
  evaluate(flagKey: string, context: UserContext): EvaluationResult {
    const flag = this.flags.get(flagKey);
    if (!flag) {
      return { enabled: false, reason: 'default' };
    }
 
    // Check overrides first (for testing/debugging)
    const userOverrides = this.overrides.get(flagKey);
    if (userOverrides?.has(context.userId)) {
      return userOverrides.get(context.userId)!;
    }
 
    // If flag is globally disabled, return default
    if (!flag.enabled) {
      return { enabled: flag.defaultValue, reason: 'default' };
    }
 
    // Check targeting rules
    for (const rule of flag.targetingRules) {
      if (this.matchesRule(rule, context)) {
        return {
          enabled: true,
          variant: rule.variant || this.selectVariant(flag, context.userId),
          reason: 'targeting',
        };
      }
    }
 
    // Check percentage rollout
    if (flag.percentageRollout !== undefined) {
      const hash = this.hashUserId(context.userId, flagKey);
      if (hash < flag.percentageRollout / 100) {
        return {
          enabled: true,
          variant: this.selectVariant(flag, context.userId),
          reason: 'percentage',
        };
      }
    }
 
    return { enabled: flag.defaultValue, reason: 'default' };
  }
 
  private matchesRule(rule: TargetingRule, context: UserContext): boolean {
    const attributeValue = this.getAttributeValue(rule.attribute, context);
    if (attributeValue === undefined) return false;
 
    switch (rule.operator) {
      case 'equals':
        return attributeValue === rule.value;
      case 'not_equals':
        return attributeValue !== rule.value;
      case 'contains':
        return String(attributeValue).includes(String(rule.value));
      case 'gt':
        return Number(attributeValue) > Number(rule.value);
      case 'lt':
        return Number(attributeValue) < Number(rule.value);
      case 'in':
        return (rule.value as unknown[]).includes(attributeValue);
      case 'not_in':
        return !(rule.value as unknown[]).includes(attributeValue);
      default:
        return false;
    }
  }
 
  private getAttributeValue(attribute: string, context: UserContext): unknown {
    const parts = attribute.split('.');
    let value: any = context;
    for (const part of parts) {
      value = value?.[part];
    }
    return value;
  }
 
  private hashUserId(userId: string, flagKey: string): number {
    // Deterministic hash for consistent bucketing
    let hash = 0;
    const str = `${userId}:${flagKey}`;
    for (let i = 0; i < str.length; i++) {
      hash = ((hash << 5) - hash + str.charCodeAt(i)) | 0;
    }
    return (Math.abs(hash) % 10000) / 10000;
  }
 
  private selectVariant(flag: FeatureFlag, userId: string): string | undefined {
    if (!flag.variants) return undefined;
 
    const hash = this.hashUserId(userId, flag.key);
    let cumulative = 0;
 
    for (const [variant, weight] of Object.entries(flag.variants)) {
      cumulative += weight / 100;
      if (hash < cumulative) {
        return variant;
      }
    }
 
    return Object.keys(flag.variants)[0];
  }
}

Build a REST API for flag management:

import express from 'express';
 
const router = express.Router();
 
// Get all flags
router.get('/api/flags', authenticate(), authorize(['admin']), async (req, res) => {
  const flags = await flagRepository.findAll({ includeArchived: req.query.archived === 'true' });
  res.json({ flags });
});
 
// Create a flag
router.post('/api/flags', authenticate(), authorize(['admin']), async (req, res) => {
  const { key, name, description, defaultValue, targetingRules, percentageRollout } = req.body;
 
  const existing = await flagRepository.findByKey(key);
  if (existing) {
    return res.status(409).json({ error: 'Flag with this key already exists' });
  }
 
  const flag = await flagRepository.create({
    key, name, description, defaultValue,
    enabled: false, targetingRules: targetingRules || [],
    percentageRollout, createdAt: new Date(), updatedAt: new Date(),
  });
 
  await auditLog.record({ action: 'flag.created', flagKey: key, userId: req.user.id });
  res.status(201).json({ flag });
});
 
// Update flag targeting
router.put('/api/flags/:key/targeting', authenticate(), authorize(['admin']), async (req, res) => {
  const { key } = req.params;
  const { targetingRules, percentageRollout } = req.body;
 
  const flag = await flagRepository.findByKey(key);
  if (!flag) return res.status(404).json({ error: 'Flag not found' });
 
  const updated = await flagRepository.update(key, {
    targetingRules,
    percentageRollout,
    updatedAt: new Date(),
  });
 
  await auditLog.record({ action: 'flag.targeting_updated', flagKey: key, userId: req.user.id });
  res.json({ flag: updated });
});
 
// Toggle flag globally
router.post('/api/flags/:key/toggle', authenticate(), authorize(['admin']), async (req, res) => {
  const { key } = req.params;
  const flag = await flagRepository.findByKey(key);
  if (!flag) return res.status(404).json({ error: 'Flag not found' });
 
  const updated = await flagRepository.update(key, {
    enabled: !flag.enabled,
    updatedAt: new Date(),
  });
 
  await auditLog.record({
    action: 'flag.toggled',
    flagKey: key,
    userId: req.user.id,
    details: { enabled: updated.enabled },
  });
 
  res.json({ flag: updated });
});

Integrate with Express middleware for automatic flag evaluation:

// Express middleware for feature flags
function featureFlags(evaluator: FeatureFlagEvaluator): RequestHandler {
  return (req, res, next) => {
    const userContext: UserContext = {
      userId: req.user?.id || 'anonymous',
      email: req.user?.email,
      country: req.headers['cf-ipcountry'] as string,
      subscription: req.user?.subscription,
      customAttributes: req.user?.customAttributes,
    };
 
    // Attach flag evaluation function to request
    (req as any).evaluateFlag = (flagKey: string): EvaluationResult => {
      return evaluator.evaluate(flagKey, userContext);
    };
 
    // Attach helper for common pattern
    (req as any).isEnabled = (flagKey: string): boolean => {
      return evaluator.evaluate(flagKey, userContext).enabled;
    };
 
    next();
  };
}
 
// Usage in routes
router.get('/api/checkout', (req, res) => {
  const result = req.evaluateFlag('new-checkout-flow');
 
  if (result.enabled) {
    switch (result.variant) {
      case 'variant-a':
        return res.json({ checkout: newCheckoutFlowA() });
      case 'variant-b':
        return res.json({ checkout: newCheckoutFlowB() });
      default:
        return res.json({ checkout: newCheckoutFlow() });
    }
  }
 
  return res.json({ checkout: legacyCheckoutFlow() });
});

Real-World Use Cases and Case Studies

Use Case 1: Facebook's Gatekeeper System

Facebook uses a sophisticated feature flag system called Gatekeeper to control feature rollouts to billions of users. Every feature change goes through a progressive rollout: employees first, then 1% of users, then 10%, then 50%, then 100%. Each stage is monitored for error rates, performance metrics, and user engagement. If any metric degrades, the feature is automatically disabled.

Use Case 2: Netflix's A/B Testing Platform

Netflix uses feature flags to power hundreds of simultaneous A/B tests. Every UI element, recommendation algorithm, and playback feature can be tested with different user segments. The flag system integrates with their experimentation platform to measure the impact of each variant on engagement metrics like viewing time and retention.

Use Case 3: GitHub's Dark Launch

GitHub uses feature flags for dark launching—deploying code to production and executing it without showing results to users. This validates that new code works correctly with real production traffic and data before making it visible. If the dark launch produces errors, the feature is disabled without users ever knowing.

Use Case 4: Trunk-Based Development at Spotify

Spotify uses feature flags to enable trunk-based development, where all developers commit to a single main branch. Features in development are hidden behind flags, allowing the team to deploy continuously without breaking the application for users. This eliminates the need for long-lived feature branches and complex merge conflicts.

Best Practices for Production

Treat flags as code: Store flag definitions in version control alongside application code. This provides a history of flag changes and enables code review for flag modifications. Use a flag management service for runtime overrides but keep defaults in code.
Implement flag audit logging: Every flag change—creation, targeting update, toggle, and archival—should be logged with the user who made the change, the timestamp, and the reason. This audit trail is essential for debugging and compliance.
Set expiration dates for flags: Every flag should have an expected expiration date. Flags that are not cleaned up by their expiration date should trigger alerts. This prevents stale flags from accumulating and increasing code complexity.
Use flag hierarchies for complex features: For features with multiple components, use a parent flag to control the overall feature and child flags for individual components. This enables rolling out the overall feature while controlling individual aspects.
Implement kill switches for critical paths: For features that could cause data corruption or financial loss, implement instant kill switches that disable the feature without code deployment. Monitor these flags and auto-disable them if error rates spike.
Test flag combinations: When multiple flags interact, test all combinations to ensure they work correctly together. Use flag dependency graphs to identify interactions and test critical paths with different flag states.
Separate flag evaluation from flag storage: Cache evaluated flag results for a configurable duration (e.g., 30 seconds) to reduce load on the flag service. Use stale-while-revalidate patterns to serve cached results while refreshing in the background.
Implement gradual rollout with automatic rollback: When rolling out a feature, automatically pause or rollback if error rates exceed thresholds. Use canary deployments with flag-driven traffic routing to validate new code with a small percentage of traffic before full rollout.

Common Pitfalls and Solutions

Pitfall	Impact	Solution
Stale flags accumulate	Code complexity increases, technical debt grows	Set expiration dates; review and archive flags regularly
Testing all flag combinations	Exponential test cases, slow CI	Test critical paths only; use flag dependency analysis
Flag evaluation performance	Latency increase for every request	Cache flag results; use local evaluation with periodic sync
Inconsistent flag state across services	Different services see different flag states	Use centralized flag service; synchronize across services
Missing audit trail	Cannot debug flag-related incidents	Implement comprehensive audit logging
Flag-driven configuration drift	Production configuration becomes unclear	Document flag purpose; use structured flag metadata

Performance Optimization

Feature flag evaluation can impact application performance if not optimized. The key is to minimize evaluation latency while maintaining accuracy.

// Cached flag evaluator with background refresh
class CachedFlagEvaluator {
  private cache: Map<string, { result: EvaluationResult; expiresAt: number }> = new Map();
  private refreshTimer: NodeJS.Timeout | null = null;
 
  constructor(
    private evaluator: FeatureFlagEvaluator,
    private cacheTtlMs: number = 30000, // 30 seconds
    private refreshIntervalMs: number = 60000 // 1 minute
  ) {
    this.startBackgroundRefresh();
  }
 
  evaluate(flagKey: string, context: UserContext): EvaluationResult {
    const cacheKey = `${flagKey}:${context.userId}`;
    const cached = this.cache.get(cacheKey);
 
    if (cached && cached.expiresAt > Date.now()) {
      return cached.result;
    }
 
    const result = this.evaluator.evaluate(flagKey, context);
    this.cache.set(cacheKey, {
      result,
      expiresAt: Date.now() + this.cacheTtlMs,
    });
 
    return result;
  }
 
  private startBackgroundRefresh(): void {
    this.refreshTimer = setInterval(() => {
      // Invalidate expired cache entries
      const now = Date.now();
      for (const [key, value] of this.cache) {
        if (value.expiresAt < now) {
          this.cache.delete(key);
        }
      }
    }, this.refreshIntervalMs);
  }
}

Comparison with Alternatives

Feature	Custom Implementation	LaunchDarkly	Unleash	Flagsmith
Self-Hosted	Yes	No	Yes	Yes
Targeting Rules	Custom	Advanced	Basic	Advanced
A/B Testing	Custom	Built-in	Plugin	Built-in
Audit Logging	Custom	Built-in	Built-in	Built-in
SDK Support	Manual	20+ languages	15+ languages	15+ languages
Cost	Development time	$$$	Free/Paid	Free/Paid
Real-Time Updates	Custom	Yes	Yes	Yes

Advanced Patterns

Flag-Driven Circuit Breaker

Use feature flags to implement circuit breakers that automatically disable features when error rates exceed thresholds.

class FlagCircuitBreaker {
  private errorCounts: Map<string, { count: number; windowStart: number }> = new Map();
 
  constructor(
    private evaluator: FeatureFlagEvaluator,
    private errorThreshold: number = 10,
    private windowMs: number = 60000
  ) {}
 
  async executeWithBreaker<T>(
    flagKey: string,
    context: UserContext,
    operation: () => Promise<T>,
    fallback: () => T
  ): Promise<T> {
    const flagResult = this.evaluator.evaluate(flagKey, context);
 
    if (!flagResult.enabled) {
      return fallback();
    }
 
    // Check circuit breaker
    if (this.isCircuitOpen(flagKey)) {
      return fallback();
    }
 
    try {
      const result = await operation();
      this.recordSuccess(flagKey);
      return result;
    } catch (error) {
      this.recordError(flagKey);
      return fallback();
    }
  }
 
  private isCircuitOpen(flagKey: string): boolean {
    const stats = this.errorCounts.get(flagKey);
    if (!stats) return false;
 
    if (Date.now() - stats.windowStart > this.windowMs) {
      this.errorCounts.delete(flagKey);
      return false;
    }
 
    return stats.count >= this.errorThreshold;
  }
 
  private recordError(flagKey: string): void {
    const stats = this.errorCounts.get(flagKey);
    if (!stats || Date.now() - stats.windowStart > this.windowMs) {
      this.errorCounts.set(flagKey, { count: 1, windowStart: Date.now() });
    } else {
      stats.count++;
    }
  }
 
  private recordSuccess(flagKey: string): void {
    this.errorCounts.delete(flagKey);
  }
}

Testing Strategies

Test feature flags by verifying flag evaluation logic, targeting rules, and integration with application code.

describe('FeatureFlagEvaluator', () => {
  const evaluator = new FeatureFlagEvaluator([
    {
      key: 'new-checkout',
      name: 'New Checkout Flow',
      description: 'Test new checkout',
      enabled: true,
      defaultValue: false,
      targetingRules: [
        { attribute: 'country', operator: 'equals', value: 'US' },
        { attribute: 'subscription', operator: 'in', value: ['premium', 'enterprise'] },
      ],
      percentageRollout: 50,
      createdAt: new Date(),
      updatedAt: new Date(),
    },
  ]);
 
  it('should enable feature for targeted users', () => {
    const result = evaluator.evaluate('new-checkout', {
      userId: 'user-1',
      country: 'US',
      subscription: 'premium',
    });
    expect(result.enabled).toBe(true);
    expect(result.reason).toBe('targeting');
  });
 
  it('should disable feature for non-targeted users below rollout threshold', () => {
    // Use a userId that consistently hashes below 50%
    const result = evaluator.evaluate('new-checkout', {
      userId: 'deterministic-user-id',
      country: 'UK',
      subscription: 'free',
    });
    // Result depends on hash
    expect(typeof result.enabled).toBe('boolean');
  });
});

Future Outlook

Feature flags are evolving from simple boolean toggles to sophisticated decision engines that integrate with experimentation platforms, observability tools, and deployment pipelines. The convergence of feature flags with progressive delivery, canary deployments, and automated rollback is creating a new paradigm for safe, data-driven software releases.

Edge computing is pushing flag evaluation to CDN edge locations, enabling sub-millisecond evaluation latency for global user bases. The adoption of AI-powered flag management—automatically adjusting rollout percentages based on real-time metrics—is an exciting frontier that could make progressive delivery fully autonomous.

Conclusion

Feature flags are essential infrastructure for modern software delivery. They enable progressive delivery, safe experimentation, and instant incident response by decoupling deployment from release. The patterns we explored—targeting rules, percentage rollouts, A/B testing, and circuit breakers—demonstrate the versatility of feature flags beyond simple feature toggling.

Key takeaways: (1) Decouple deployment from release using feature flags; (2) Implement targeting rules for user segmentation; (3) Use percentage rollouts for safe, gradual feature releases; (4) Set expiration dates for flags to prevent technical debt; (5) Implement audit logging for all flag changes; (6) Cache flag evaluation results for performance.

Start with simple boolean flags for your most critical features, then progressively add targeting, rollouts, and experimentation as your team matures. The investment in feature flag infrastructure pays dividends in deployment confidence, experimentation capability, and incident response speed. Feature flags are not just a development tool—they are a competitive advantage.

Minh Vo

Slaying code & making it lit fr fr 🔥 tagline