MinhVo

Minh Vo

rss feed

Slaying code & making it lit fr fr 🔥 tagline

Hey there 👋 I'm an AI Engineer with 7 years of experience building scalable web and mobile applications. Currently at Neurond AI (May 2025 — present), architecting an Enterprise AI Assistant Platform with multi-tenant RAG on pgvector, multi-provider LLM orchestration, and Azure-native infrastructure. Previously spent 5+ years at SNAPTEC (Sep 2019 — Apr 2025), leading SaaS themes, admin dashboards, and e-commerce platforms — earned the Hero of the Year award in 2021. I specialize in TypeScript, React, Next.js, and AI-Native engineering with Claude Code and Cursor.bio

Back to blogs

API Gateway Patterns: Kong, AWS API Gateway, and Express Gateway

Implement API gateways: rate limiting, authentication, routing, and request transformation.

API GatewayKongAWSMicroservices

By MinhVo

Introduction

In a microservices architecture, the API gateway is the single entry point for all client requests. It sits between clients and backend services, handling cross-cutting concerns like authentication, rate limiting, request routing, load balancing, and response transformation. Without an API gateway, each client would need to know about every microservice, handle authentication independently, and manage the complexity of service discovery. The gateway centralizes these concerns, simplifying both client code and backend service design.

API gateway architecture

The choice of API gateway technology depends on your infrastructure and requirements. Kong (open-source, NGINX-based) offers maximum flexibility with a rich plugin ecosystem. AWS API Gateway provides seamless integration with AWS services and serverless backends. Express Gateway (open-source, Node.js-based) is lightweight and JavaScript-native. Each has distinct strengths, and understanding them helps you choose the right tool for your architecture.

API gateways are not just reverse proxies — they're intelligent request processors that can transform requests and responses, aggregate data from multiple services, implement circuit breakers, collect analytics, and enforce security policies. They're the foundation of a well-architected microservices system, providing the abstraction layer that enables services to evolve independently without breaking clients.

Understanding API Gateways: Core Concepts

Request Routing

The gateway's primary function is routing incoming requests to the appropriate backend service. Routing can be based on URL path, HTTP method, headers, query parameters, or request body content. Advanced routing enables A/B testing, canary deployments, and geographic routing.

Authentication and Authorization

The gateway authenticates incoming requests using API keys, JWT tokens, OAuth 2.0, or mutual TLS, and authorizes them based on scopes, roles, or custom policies. By centralizing auth at the gateway, backend services can trust that requests are already authenticated and focus on business logic.

Rate Limiting and Throttling

Rate limiting protects backend services from overload by restricting the number of requests per client, per API key, or per IP address. Throttling slows down requests that exceed limits rather than rejecting them, providing a smoother degradation under load.

Request/Response Transformation

The gateway can modify requests before forwarding them to backends and responses before returning them to clients. This includes header manipulation, body transformation, protocol conversion (REST to gRPC), and response aggregation from multiple services.

Circuit Breaking

When a backend service is unhealthy, the circuit breaker trips and the gateway returns a fallback response instead of forwarding requests to the failing service. This prevents cascading failures and gives the service time to recover.

API gateway request flow

Architecture and Design Patterns

The Single Gateway Pattern

All traffic goes through one gateway instance. Simple to manage but becomes a single point of failure and a bottleneck at scale. Suitable for small to medium deployments.

The Backend-for-Frontend (BFF) Pattern

Create separate gateway instances for each client type (web, mobile, IoT). Each BFF is optimized for its client's needs — the mobile BFF might aggregate more aggressively to reduce round trips, while the web BFF provides more granular endpoints.

The Micro-Gateway Pattern

Each team manages its own gateway for its services. This enables independent deployment and configuration but requires a higher-level gateway or service mesh for cross-cutting concerns.

The Edge Gateway Pattern

An outer gateway handles external concerns (SSL termination, authentication, rate limiting) while inner gateways handle service-to-service routing and load balancing. This separates concerns and enables different scaling strategies.

Step-by-Step Implementation

Kong API Gateway Configuration

# Kong declarative configuration (kong.yml)
_format_version: "3.0"
 
services:
  - name: user-service
    url: http://user-service:3000
    routes:
      - name: user-routes
        paths:
          - /api/v1/users
        strip_path: false
    plugins:
      - name: rate-limiting
        config:
          minute: 100
          hour: 1000
          policy: redis
          redis:
            host: redis
            port: 6379
      - name: jwt
        config:
          claims_to_verify:
            - exp
      - name: cors
        config:
          origins:
            - "https://app.example.com"
          methods:
            - GET
            - POST
            - PUT
            - DELETE
          max_age: 3600
 
  - name: product-service
    url: http://product-service:3001
    routes:
      - name: product-routes
        paths:
          - /api/v1/products
    plugins:
      - name: rate-limiting
        config:
          minute: 200
          policy: local
      - name: key-auth
      - name: request-transformer
        config:
          add:
            headers:
              - "X-Request-ID:$(request_id)"

AWS API Gateway with Lambda

import { APIGatewayProxyEvent, APIGatewayProxyResult } from 'aws-lambda';
 
// Lambda handler for API Gateway
export async function handler(event: APIGatewayProxyEvent): Promise<APIGatewayProxyResult> {
  try {
    const { path, httpMethod, queryStringParameters, body, requestContext } = event;
    
    // Authentication is handled by API Gateway authorizers
    const userId = requestContext.authorizer?.claims?.sub;
 
    // Route based on path and method
    const response = await routeRequest(path, httpMethod, {
      userId,
      query: queryStringParameters || {},
      body: body ? JSON.parse(body) : null,
    });
 
    return {
      statusCode: 200,
      headers: {
        'Content-Type': 'application/json',
        'Access-Control-Allow-Origin': '*',
        'X-Request-Id': requestContext.requestId,
      },
      body: JSON.stringify(response),
    };
  } catch (err) {
    return {
      statusCode: err instanceof AppError ? err.statusCode : 500,
      body: JSON.stringify({ error: err instanceof Error ? err.message : 'Internal error' }),
    };
  }
}
 
// CDK stack for API Gateway
import * as apigateway from 'aws-cdk-lib/aws-apigateway';
import * as lambda from 'aws-cdk-lib/aws-lambda';
 
const api = new apigateway.RestApi(this, 'MyApi', {
  deployOptions: {
    stageName: 'prod',
    throttlingBurstLimit: 100,
    throttlingRateLimit: 50,
  },
});
 
const userLambda = new lambda.Function(this, 'UserHandler', {
  runtime: lambda.Runtime.NODEJS_20_X,
  handler: 'index.handler',
  code: lambda.Code.fromAsset('lambda'),
});
 
api.root.resourceForPath('users').addMethod('GET',
  new apigateway.LambdaIntegration(userLambda),
  { authorizer: new apigateway.CognitoUserPoolsAuthorizer(this, 'Auth', { cognitoUserPools: [userPool] }) }
);

Custom Express Gateway

import express from 'express';
import { createProxyMiddleware } from 'http-proxy-middleware';
import rateLimit from 'express-rate-limit';
import jwt from 'jsonwebtoken';
 
const app = express();
 
// Rate limiting
const limiter = rateLimit({
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 100, // limit each IP to 100 requests per windowMs
  standardHeaders: true,
  legacyHeaders: false,
  message: { error: 'Too many requests, please try again later.' },
});
app.use(limiter);
 
// Authentication middleware
function authenticate(req, res, next) {
  const token = req.headers.authorization?.replace('Bearer ', '');
  if (!token) {
    return res.status(401).json({ error: 'Authentication required' });
  }
 
  try {
    const decoded = jwt.verify(token, process.env.JWT_SECRET);
    req.user = decoded;
    next();
  } catch (err) {
    res.status(401).json({ error: 'Invalid token' });
  }
}
 
// Request logging
app.use((req, res, next) => {
  const start = Date.now();
  res.on('finish', () => {
    console.log(JSON.stringify({
      method: req.method,
      path: req.path,
      status: res.statusCode,
      duration: Date.now() - start,
      userId: req.user?.sub,
    }));
  });
  next();
});
 
// Service routes
app.use('/api/v1/users', authenticate, createProxyMiddleware({
  target: 'http://user-service:3000',
  changeOrigin: true,
  pathRewrite: { '^/api/v1/users': '/users' },
}));
 
app.use('/api/v1/products', createProxyMiddleware({
  target: 'http://product-service:3001',
  changeOrigin: true,
  pathRewrite: { '^/api/v1/products': '/products' },
}));
 
// Circuit breaker for each service
class CircuitBreaker {
  private failures = 0;
  private lastFailure = 0;
  private state: 'closed' | 'open' | 'half-open' = 'closed';
  private threshold = 5;
  private resetTimeout = 30000;
 
  async call(fn: () => Promise<unknown>): Promise<unknown> {
    if (this.state === 'open') {
      if (Date.now() - this.lastFailure > this.resetTimeout) {
        this.state = 'half-open';
      } else {
        throw new Error('Circuit breaker is open');
      }
    }
 
    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (err) {
      this.onFailure();
      throw err;
    }
  }
 
  private onSuccess() {
    this.failures = 0;
    this.state = 'closed';
  }
 
  private onFailure() {
    this.failures++;
    this.lastFailure = Date.now();
    if (this.failures >= this.threshold) {
      this.state = 'open';
    }
  }
}
 
app.listen(8080, () => console.log('API Gateway running on port 8080'));

API gateway monitoring

Real-World Use Cases

Public API Management

Expose a public API through the gateway with API key authentication, rate limiting per tier (free: 100 req/day, pro: 10K req/day, enterprise: unlimited), usage analytics, and developer portal documentation.

Microservices Aggregation

Aggregate responses from multiple backend services into a single response for the client. Instead of the client making 5 API calls, the gateway makes 5 calls internally and returns a single aggregated response, reducing latency and client complexity.

Legacy API Modernization

Wrap legacy SOAP or XML APIs with a modern REST/JSON gateway. The gateway transforms requests and responses between formats, enabling new clients to use modern protocols while legacy systems continue operating unchanged.

Multi-Region Routing

Route requests to the nearest data center based on client geography, with automatic failover to other regions if the primary is unhealthy. This provides low latency globally with high availability.

Best Practices for Production

  1. Keep the gateway thin — The gateway should handle cross-cutting concerns only. Business logic belongs in backend services, not the gateway.

  2. Use circuit breakers — Protect against cascading failures by implementing circuit breakers for each backend service. Fail fast rather than waiting for timeouts.

  3. Implement comprehensive logging — Log every request with timing, status, and error information. This data is essential for debugging, monitoring, and capacity planning.

  4. Version your APIs — Support multiple API versions simultaneously through the gateway. Route v1 requests to legacy services and v2 to new services during migrations.

  5. Cache aggressively — Cache responses at the gateway level for read-heavy APIs. Implement cache invalidation strategies for data that changes frequently.

  6. Use health checks — The gateway should continuously check backend health and stop routing to unhealthy instances automatically.

  7. Implement request validation — Validate request schemas (headers, body, query parameters) at the gateway before forwarding to backends. Reject invalid requests early.

  8. Monitor and alert — Track latency percentiles, error rates, and throughput per service. Set up alerts for anomalies.

Common Pitfalls and Solutions

PitfallImpactSolution
Gateway as a monolithSingle point of failure, scaling bottleneckUse multiple instances with load balancing
Business logic in gatewayHard to maintain, test, and deployKeep business logic in backend services
No circuit breakersCascading failures across servicesImplement circuit breakers per service
Missing rate limitingBackend services overwhelmedConfigure rate limits per client and endpoint
No cachingUnnecessary load on backendsImplement response caching with TTL
Ignoring latencyPoor user experienceMonitor p50/p95/p99 latency, optimize hot paths
Tight coupling to backendsCan't change backends independentlyUse service discovery and abstract backend URLs

Debugging Gateway Issues

Use request tracing (correlation IDs) to follow requests through the gateway to backend services. Enable debug logging for specific routes or clients. Use the gateway's analytics to identify slow endpoints, high error rates, and traffic patterns.

Performance Optimization

Optimize gateway performance by enabling connection pooling to backend services, implementing response compression (gzip/brotli), using HTTP/2 for client connections, and caching responses at the edge.

For high-throughput gateways, consider NGINX-based solutions (Kong) for raw performance, implement async I/O for long-polling connections, and use CDN integration for static content.

Comparison of API Gateway Solutions

FeatureKongAWS API GatewayExpress GatewayAmbassador
Open Source✓ (OSS)✗✓✓ (OSS)
Self-hosted✓✗ (managed)✓✓
Performance★★★★★★★★★★★★★★★★
Plugin Ecosystem★★★★★★★★★★★★★★★★
AWS Integration★★★★★★★★★★★★★
Kubernetes Native★★★★★★★★★★★★★
Learning CurveMediumLowLowMedium
Best ForGeneral purposeAWS workloadsNode.js teamsKubernetes

Advanced Patterns

Request Aggregation

The gateway receives a single request and calls multiple backend services in parallel, aggregating the results into a single response. This is essential for mobile clients that need to minimize round trips.

Canary Deployments

Route a percentage of traffic (5-10%) to a new version of a backend service while the rest continues to the stable version. Monitor error rates and latency for the canary, and gradually increase traffic if healthy.

API Composition

Define composite APIs that combine data from multiple services into a single endpoint. The gateway handles the orchestration, parallel calls, and response merging transparently.

Future Outlook

API gateways are evolving toward service mesh integration — where the gateway handles north-south traffic (external to internal) and the service mesh handles east-south traffic (service to service). This convergence simplifies the networking layer and provides consistent policies across all traffic.

The most significant trend is AI-powered API management — using machine learning to detect anomalous traffic patterns, automatically adjust rate limits, predict capacity needs, and identify security threats. This will make API gateways smarter and more self-managing.

Architecture Decision Records

When evaluating architectural choices for your project, documenting your decision-making process through Architecture Decision Records (ADRs) provides invaluable context for future team members and stakeholders. Each ADR captures the context, decision, and consequences of a specific architectural choice.

Creating Effective ADRs

An ADR should include the date of the decision, the status (proposed, accepted, deprecated, or superseded), the context that motivated the decision, the decision itself, and the expected consequences both positive and negative. This structured approach ensures that decisions are traceable and reversible when circumstances change.

# ADR-001: Choose React for Frontend Framework
 
## Status: Accepted
 
## Context
We need a frontend framework that supports component-based architecture,
has a large ecosystem, and provides good TypeScript support.
 
## Decision
We will use React 18+ with TypeScript for all new frontend projects.
 
## Consequences
- Large talent pool available for hiring
- Mature ecosystem with extensive third-party libraries
- Strong TypeScript integration
- Requires additional libraries for routing and state management

Decision Matrix for Technology Selection

Create a weighted decision matrix when comparing multiple options. List your evaluation criteria (performance, learning curve, ecosystem maturity, community support, long-term viability) and assign weights based on your project priorities. Score each option on a scale of 1-5 for each criterion, then calculate weighted totals.

This systematic approach removes emotion from technology decisions and provides a defensible rationale when stakeholders question your choices. Document the matrix alongside your ADR so future teams understand not just what was chosen, but why alternatives were rejected.

Reversibility and Migration Paths

Every architectural decision should include a migration path in case the decision needs to be reversed. Consider the cost of changing course at six months, twelve months, and two years. Decisions with low reversal costs can be made more aggressively, while irreversible decisions warrant extended evaluation periods and proof-of-concept implementations.

For example, choosing a CSS-in-JS library has a relatively low reversal cost since styles can be migrated incrementally component by component. However, choosing a database technology has a high reversal cost due to data migration complexity and potential schema changes throughout the codebase.

Production Deployment and Operations

Running backend services in production requires attention to reliability, observability, and operational concerns that don't exist in development environments. Proper deployment practices ensure your service remains available and performant under real-world conditions.

Graceful Shutdown Handling

Implement graceful shutdown to prevent request failures during deployments and restarts:

const server = app.listen(PORT, () => {
  console.log(`Server running on port ${PORT}`);
});
 
async function gracefulShutdown(signal) {
  console.log(`Received ${signal}, starting graceful shutdown...`);
 
  // Stop accepting new connections
  server.close(async () => {
    console.log('HTTP server closed');
 
    try {
      // Wait for existing requests to complete (with timeout)
      await Promise.race([
        waitForActiveRequests(),
        new Promise((_, reject) =>
          setTimeout(() => reject(new Error('Shutdown timeout')), 30000)
        ),
      ]);
 
      // Close database connections
      await db.destroy();
      await redis.quit();
 
      console.log('Graceful shutdown completed');
      process.exit(0);
    } catch (error) {
      console.error('Error during shutdown:', error);
      process.exit(1);
    }
  });
 
  // Force shutdown after timeout
  setTimeout(() => {
    console.error('Forced shutdown after timeout');
    process.exit(1);
  }, 35000);
}
 
process.on('SIGTERM', () => gracefulShutdown('SIGTERM'));
process.on('SIGINT', () => gracefulShutdown('SIGINT'));

Structured Logging

Replace console.log with structured logging that supports log aggregation and querying:

const pino = require('pino');
 
const logger = pino({
  level: process.env.LOG_LEVEL || 'info',
  formatters: {
    level(label) {
      return { level: label };
    },
  },
  serializers: {
    err: pino.stdSerializers.err,
    req: pino.stdSerializers.req,
    res: pino.stdSerializers.res,
  },
  redact: {
    paths: ['req.headers.authorization', 'req.headers.cookie'],
    remove: true,
  },
});
 
// Request logging middleware
app.use((req, res, next) => {
  const start = Date.now();
  res.on('finish', () => {
    logger.info({
      req,
      res,
      responseTime: Date.now() - start,
    }, `${req.method} ${req.url} ${res.statusCode}`);
  });
  next();
});

Rate Limiting and Abuse Prevention

Protect your API endpoints with rate limiting that adapts to different client types:

const rateLimit = require('express-rate-limit');
const RedisStore = require('rate-limit-redis');
 
const apiLimiter = rateLimit({
  store: new RedisStore({ client: redisClient }),
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 100, // 100 requests per window
  standardHeaders: true,
  legacyHeaders: false,
  keyGenerator: (req) => req.user?.id || req.ip,
  handler: (req, res) => {
    logger.warn({ ip: req.ip, user: req.user?.id }, 'Rate limit exceeded');
    res.status(429).json({
      error: 'Too many requests',
      retryAfter: Math.ceil(req.rateLimit.resetTime / 1000),
    });
  },
});
 
app.use('/api/', apiLimiter);

These operational practices form the foundation of a reliable production service that can handle real-world traffic patterns and failure scenarios.

Community Resources and Further Learning

The technology landscape evolves rapidly, making continuous learning essential for maintaining expertise. Building a systematic approach to staying current with developments in your technology stack ensures you can leverage new features and avoid deprecated patterns.

Curated Learning Pathways

Rather than consuming content randomly, create structured learning pathways aligned with your current projects and career goals. Start with official documentation and specification documents, which provide the most accurate and comprehensive information. Follow this with hands-on tutorials and workshops that reinforce concepts through practical application.

Technical blogs from framework maintainers and core team members often provide deeper insights into design decisions and upcoming features. Subscribe to the official blogs of your primary frameworks and libraries to stay ahead of breaking changes and deprecation timelines.

Contributing to Open Source

Contributing to open-source projects in your technology stack provides unparalleled learning opportunities. Start with documentation improvements and bug reports, then progress to fixing small issues tagged as "good first issue" in your favorite projects. This direct engagement with maintainers and the codebase accelerates your understanding far beyond what passive learning can achieve.

# Setting up for contribution
git clone https://github.com/project/repository.git
cd repository
git checkout -b fix/issue-description
 
# Run the project's contribution setup
npm run setup:dev
npm run test  # Ensure tests pass before making changes
 
# Make your changes, then run the full test suite
npm run test:full
npm run lint
npm run build
 
# Submit your contribution
git add -A
git commit -m "fix: description of the fix
 
Closes #1234"
git push origin fix/issue-description

Building a Technical Knowledge Base

Maintain a personal knowledge base that captures insights, solutions, and patterns you discover during your work. Tools like Obsidian, Notion, or even a simple Markdown repository can serve as an external memory that grows more valuable over time.

Organize your notes by topic rather than chronologically, and include code examples, links to relevant documentation, and explanations of why certain approaches work better than others. When you encounter a particularly insightful article or conference talk, write a summary that captures the key takeaways and how they apply to your current projects.

Follow key conferences and their published talks to stay informed about emerging patterns and best practices. Many conferences publish recorded talks on YouTube within weeks of the event, making world-class technical content freely accessible.

Join relevant Discord servers, Slack communities, and forums where practitioners discuss real-world challenges and solutions. These communities provide early warning about emerging issues and access to collective wisdom that isn't available through formal documentation.

Mentorship and Knowledge Sharing

Teaching others is one of the most effective ways to deepen your own understanding. Consider writing technical blog posts, giving talks at local meetups, or mentoring junior developers. The process of explaining concepts to others forces you to organize your knowledge and identify gaps in your understanding.

Pair programming sessions with colleagues of different experience levels create mutual learning opportunities. Senior developers gain fresh perspectives on problems they've solved the same way for years, while junior developers benefit from exposure to production-grade thinking and decision-making processes.

Conclusion

The API gateway is the cornerstone of a well-architected microservices system. It centralizes cross-cutting concerns, simplifies client code, and provides the abstraction layer that enables backend services to evolve independently.

Key takeaways:

  1. The API gateway is the single entry point for all client requests, handling routing, auth, rate limiting, and transformation
  2. Choose Kong for maximum flexibility, AWS API Gateway for AWS-native, Express Gateway for Node.js simplicity
  3. Keep the gateway thin — business logic belongs in backend services
  4. Implement circuit breakers to prevent cascading failures
  5. Cache responses at the gateway level to reduce backend load
  6. Monitor latency, error rates, and throughput per service
  7. Use API versioning to enable independent service evolution

Start by implementing a simple gateway that routes requests to 2-3 backend services with authentication and rate limiting. Add circuit breakers, caching, and logging as your system grows. The gateway's value compounds as your microservices architecture matures.