MinhVo

Minh Vo

rss feed

Slaying code & making it lit fr fr πŸ”₯ tagline

Hey there πŸ‘‹ I'm an AI Engineer with 7 years of experience building scalable web and mobile applications. Currently at Neurond AI (May 2025 β€” present), architecting an Enterprise AI Assistant Platform with multi-tenant RAG on pgvector, multi-provider LLM orchestration, and Azure-native infrastructure. Previously spent 5+ years at SNAPTEC (Sep 2019 β€” Apr 2025), leading SaaS themes, admin dashboards, and e-commerce platforms β€” earned the Hero of the Year award in 2021. I specialize in TypeScript, React, Next.js, and AI-Native engineering with Claude Code and Cursor.bio

Back to blogs

AWS Lambda Best Practices: Performance and Cost

Optimize Lambda: memory tuning, cold start reduction, provisioned concurrency.

AWSLambdaServerlessPerformance

By MinhVo

Introduction

AWS Lambda revolutionized how we build and deploy applications by abstracting away server management entirely. Yet the convenience of serverless comes with its own set of optimization challenges. Teams that blindly deploy Lambda functions without tuning memory, runtime, and invocation patterns often face surprise bills, cold start latency, and timeout errors that degrade user experience.

Understanding Lambda's internal execution model is the difference between a cost-effective, snappy serverless architecture and one that bleeds money while frustrating users. This guide distills battle-tested best practices drawn from production Lambda workloads handling millions of invocations per month. Whether you're running event-driven microservices, data pipelines, or API backends, these patterns will help you squeeze maximum performance from every dollar spent.

Hero image

Understanding Lambda's Execution Model: Core Concepts

At its core, AWS Lambda executes your code in a managed container called a Lambda execution environment. Each invocation either reuses an existing warm environment or spins up a new one. The time spent creating a new environment is the dreaded cold start, which can add hundreds of milliseconds to several seconds of latency depending on your runtime, package size, and VPC configuration.

Lambda allocates CPU power linearly proportional to the memory you configure. Setting 1,769 MB of memory gives you one full vCPU. Below that threshold, your function shares CPU time with other tenants. This means memory is not just about RAMβ€”it's your primary CPU tuning knob. Many teams under-provision memory thinking they're saving cost, when in reality their function runs slower, consuming more billable duration.

The Lambda pricing model charges you for three dimensions: the number of requests, the duration of each invocation (rounded to the nearest millisecond), and the amount of memory allocated. The free tier covers 1 million requests and 400,000 GB-seconds per month. Beyond that, pricing is 0.20permillionrequestsand0.20 per million requests and 0.0000166667 per GB-second. This granularity means small optimizations compound dramatically at scale.

Lambda also enforces several hard limits: a 15-minute maximum execution time, 10 GB maximum memory, 512 MB of ephemeral storage (upgradable to 10 GB), and a 6 MB payload for synchronous invocations (or 256 KB for asynchronous). Understanding these boundaries shapes every architectural decision you make.

Architecture and Design Patterns

The Stateless Function Pattern

Lambda functions must be stateless. Every invocation should be independent, with no reliance on data persisted in the execution environment between calls. Store state in DynamoDB, S3, ElastiCache, or RDS. This design enables horizontal scaling and eliminates coordination overhead.

Event-Driven Decomposition

Decompose monolithic logic into small, focused Lambda functions triggered by specific events. An S3 upload triggers a thumbnail generator. A DynamoDB stream triggers a notification sender. Each function does one thing well, making debugging, scaling, and cost attribution straightforward.

The Fan-Out/Fan-In Pattern

For parallelizable workloads, use SQS or SNS to fan out tasks to multiple Lambda invocations, then aggregate results using Step Functions or a DynamoDB accumulator. This pattern converts a slow serial process into a fast parallel one, though you must account for the downstream concurrency limits.

Asynchronous Invocation Strategy

Prefer asynchronous invocation for non-time-sensitive work. Configure Lambda to process SQS messages, SNS notifications, or EventBridge events asynchronously. This decouples producers from consumers, smooths traffic spikes, and enables built-in retry logic with dead-letter queues.

Architecture patterns

Step-by-Step Implementation

Memory Tuning with AWS Lambda Power Tuning

The most impactful optimization is right-sizing memory. AWS Lambda Power Tuning is a Step Functions-based tool that runs your function at different memory configurations and plots cost vs. performance.

// Invoke the tuner programmatically
import { SFNClient, StartExecutionCommand } from "@aws-sdk/client-sfn";
 
const client = new SFNClient({ region: "us-east-1" });
 
async function runPowerTuning(functionArn: string) {
  const command = new StartExecutionCommand({
    stateMachineArn: "arn:aws:states:us-east-1:ACCOUNT:stateMachine:powerTuning",
    input: JSON.stringify({
      lambdaARN: functionArn,
      powerValues: [128, 256, 512, 1024, 1536, 2048, 3008],
      num: 50,
      payload: { httpMethod: "GET", path: "/health" },
      parallelInvocation: true,
      strategy: "balanced",
    }),
  });
  await client.send(command);
}

Reducing Cold Starts

Cold starts come from three sources: downloading the function code, initializing the runtime, and running your initialization code. Address each one.

// bad.ts - Initializing clients inside the handler
export async function handler(event: APIGatewayEvent) {
  const dynamo = new DynamoDBClient({ region: "us-east-1" });
  const s3 = new S3Client({ region: "us-east-1" });
 
  const result = await dynamo.send(new GetItemCommand({
    TableName: "Users",
    Key: { id: { S: event.pathParameters!.userId! } },
  }));
  return { statusCode: 200, body: JSON.stringify(result.Item) };
}
 
// good.ts - Initialize outside the handler (module scope)
const dynamo = new DynamoDBClient({ region: "us-east-1" });
const s3 = new S3Client({ region: "us-east-1" });
 
export async function handler(event: APIGatewayEvent) {
  const result = await dynamo.send(new GetItemCommand({
    TableName: "Users",
    Key: { id: { S: event.pathParameters!.userId! } },
  }));
  return { statusCode: 200, body: JSON.stringify(result.Item) };
}

Minimizing Deployment Package Size

Large packages increase cold start time. Use bundlers to tree-shake unused code.

// esbuild.config.mjs
import { build } from "esbuild";
 
await build({
  entryPoints: ["src/handlers/users.ts", "src/handlers/orders.ts"],
  bundle: true,
  minify: true,
  sourcemap: true,
  platform: "node",
  target: "node20",
  outdir: "dist",
  external: ["@aws-sdk/*"],
  splitting: true,
  format: "esm",
  metafile: true,
});

Connection Pooling for RDS

Never open a new database connection per invocation. Use RDS Proxy or a connection pool initialized at module scope.

import { Pool } from "pg";
 
const pool = new Pool({
  host: process.env.DB_HOST,
  database: process.env.DB_NAME,
  user: process.env.DB_USER,
  password: process.env.DB_PASSWORD,
  max: 1,
  idleTimeoutMillis: 60000,
  connectionTimeoutMillis: 5000,
});
 
export async function handler(event: APIGatewayEvent) {
  const client = await pool.connect();
  try {
    const result = await client.query(
      "SELECT * FROM orders WHERE user_id = $1",
      [event.pathParameters!.userId]
    );
    return { statusCode: 200, body: JSON.stringify(result.rows) };
  } finally {
    client.release();
  }
}

Implementation workflow

Real-World Use Cases

E-Commerce Order Processing Pipeline

A mid-size e-commerce platform processes 50,000 orders daily through Lambda. Each order triggers a chain: validate payment (Lambda), update inventory (Lambda), send confirmation email (SNS β†’ Lambda), and update analytics (Kinesis β†’ Lambda). By tuning each function independently to its optimal memory setting using Power Tuning, the team reduced total processing cost by 40% while cutting average latency from 800ms to 350ms.

Real-Time Image Processing Service

A social media platform uses Lambda to generate image thumbnails, apply watermarks, and extract EXIF data on upload. By switching from synchronous invocation to S3 event triggers with asynchronous processing, they eliminated timeout errors during traffic spikes. Provisioned concurrency on the thumbnail generator ensures zero cold starts for the user-facing upload flow.

Scheduled Data Pipeline

A financial services company runs a nightly ETL pipeline using Lambda functions orchestrated by Step Functions. Each function processes a chunk of data from S3, transforms it, and writes results to Redshift. By using Lambda's 10 GB ephemeral storage and S3 Select for partial downloads, they reduced data transfer costs by 60% and cut pipeline duration from 4 hours to 45 minutes.

API Gateway Backend for Mobile App

A mobile app backend uses API Gateway + Lambda with provisioned concurrency configured for peak hours. During off-peak, on-demand Lambda handles background tasks like push notification scheduling and data aggregation. The hybrid approach costs 70% less than running always-on EC2 instances while maintaining sub-100ms p99 latency.

Best Practices for Production

  1. Always use Power Tuning before deploying β€” Running AWS Lambda Power Tuning takes 10 minutes and routinely reveals that a higher memory setting produces both faster execution and lower cost due to reduced duration charges. The "balanced" strategy finds the sweet spot automatically.

  2. Set timeouts to half the expected duration β€” If your function typically completes in 3 seconds, set the timeout to 6 seconds. This gives breathing room for slow invocations while catching infinite loops or stuck connections before they max out at 15 minutes and rack up charges.

  3. Use ARM64 (Graviton2) architecture β€” Lambda functions on ARM64 cost 20% less than x86_64 with comparable or better performance. Switch by changing the architecture setting in your Lambda configuration and rebuilding your deployment package for the arm64 target.

  4. Implement exponential backoff with jitter β€” When your Lambda calls other AWS services that return throttling errors (HTTP 429), retry with exponential backoff and random jitter. This prevents thundering herd problems where hundreds of concurrent Lambda instances retry simultaneously.

  5. Use Lambda Layers for shared dependencies β€” Package common libraries (database clients, auth utilities, validation logic) into Lambda Layers. This reduces deployment package size, enables code sharing across functions, and speeds up deployments.

  6. Monitor with CloudWatch embedded metrics β€” Use the CloudWatch embedded metrics format to emit custom metrics without installing the CloudWatch agent. This gives you per-function cost attribution, latency percentiles, and error rates in CloudWatch dashboards.

  7. Prefer SQS over direct Lambda invocation for async work β€” SQS provides built-in buffering, dead-letter queues, and at-least-once delivery guarantees. Direct async Lambda invocation drops events after two failed retries by default.

  8. Enable X-Ray tracing selectively β€” X-Ray adds latency and cost. Enable it for debugging and performance analysis, then disable or sample at 1% for high-volume production functions.

Common Pitfalls and Solutions

PitfallImpactSolution
Using 128 MB memory to "save cost"Functions run 10x slower, often costing more in durationUse Power Tuning to find the optimal memory setting
Opening database connections inside handlerNew TCP handshake on every invocation, connection exhaustionInitialize connection pool at module scope, use RDS Proxy
Not handling Lambda's retry behaviorDuplicate processing of eventsMake handlers idempotent using idempotency keys
Synchronous invocation chainsTight coupling, cascading failures, compounded latencyUse event-driven architecture with SQS/SNS/EventBridge
Ignoring concurrency limitsThrottling under load, dropped eventsRequest concurrency increases, use reserved concurrency
Shipping full node_modules50+ MB packages causing 5-10s cold startsBundle with esbuild, externalize AWS SDK, use Lambda Layers

Performance Optimization

Provisioned Concurrency

For latency-sensitive functions, provisioned concurrency keeps execution environments warm and ready. It eliminates cold starts but costs money for idle capacity.

// CDK configuration for provisioned concurrency
import * as lambda from "aws-cdk-lib/aws-lambda";
import * as appscaling from "aws-cdk-lib/aws-applicationautoscaling";
 
const fn = new lambda.Function(this, "ApiHandler", {
  runtime: lambda.Runtime.NODEJS_20_X,
  handler: "index.handler",
  code: lambda.Code.fromAsset("dist"),
  memorySize: 1024,
  architecture: lambda.Architecture.ARM_64,
});
 
const alias = fn.addAlias("live", {
  provisionedConcurrentExecutions: 10,
});
 
const target = new appscaling.ScalableTarget(this, "ScalableTarget", {
  serviceNamespace: appscaling.ServiceNamespace.LAMBDA,
  scalableDimension: "lambda:function:ProvisionedConcurrency",
  minCapacity: 5,
  maxCapacity: 100,
  resourceId: `function:${fn.functionName}:live`,
});

Ephemeral Storage Optimization

For data-intensive functions, use /tmp storage (up to 10 GB) for intermediate files. This avoids repeated S3 downloads within the same invocation.

import { readFileSync, writeFileSync } from "fs";
import { execSync } from "child_process";
 
export async function handler(event: S3Event) {
  const bucket = event.Records[0].s3.bucket.name;
  const key = event.Records[0].s3.object.key;
 
  const tmpPath = `/tmp/${key.split("/").pop()}`;
  const data = await s3.send(new GetObjectCommand({ Bucket: bucket, Key: key }));
  writeFileSync(tmpPath, await data.Body!.transformToByteArray());
 
  execSync(`ffmpeg -i ${tmpPath} -vf scale=1280:720 /tmp/output.mp4`);
 
  await s3.send(new PutObjectCommand({
    Bucket: process.env.OUTPUT_BUCKET,
    Key: `processed/${key}`,
    Body: readFileSync("/tmp/output.mp4"),
  }));
}

Comparison with Alternatives

FeatureAWS LambdaAWS FargateEC2 InstancesCloudflare Workers
Cold start100ms-6s30-60sNone<1ms
Max execution time15 minUnlimitedUnlimited30s (free) / 15min (paid)
ScalingAutomatic (per-request)Auto-scaling groupsManual/autoAutomatic
Cost modelPer-request + durationPer-vCPU-hourPer-hourPer-request
Best forEvent-driven, burstyLong-running containersPredictable workloadsEdge compute, simple APIs
Memory range128 MB-10 GB0.5-30 GBCustom128 MB

Advanced Patterns

Lambda Powertools for TypeScript

AWS Lambda Powertools provides observability, tracing, and structured logging utilities purpose-built for serverless.

import { Logger } from "@aws-lambda-powertools/logger";
import { Tracer } from "@aws-lambda-powertools/tracer";
import { Metrics, MetricUnits } from "@aws-lambda-powertools/metrics";
 
const logger = new Logger({ serviceName: "order-service" });
const tracer = new Tracer({ serviceName: "order-service" });
const metrics = new Metrics({ namespace: "ECommerce", serviceName: "order-service" });
 
export async function handler(event: APIGatewayEvent, context: Context) {
  logger.addContext(context);
 
  const segment = tracer.getSegment();
  const subsegment = segment?.addNewSubsegment("processOrder");
  tracer.setSegment(subsegment!);
 
  try {
    const order = JSON.parse(event.body!);
    logger.info("Processing order", {
      orderId: order.id,
      itemCount: order.items.length,
    });
 
    metrics.addMetric("OrderProcessed", MetricUnits.Count, 1);
    metrics.addMetric("OrderValue", MetricUnits.None, order.total);
 
    subsegment?.close();
    return { statusCode: 200, body: JSON.stringify({ orderId: order.id }) };
  } catch (error) {
    logger.error("Order processing failed", error as Error);
    metrics.addMetric("OrderFailed", MetricUnits.Count, 1);
    throw error;
  } finally {
    metrics.publishStoredMetrics();
  }
}

Lambda Response Streaming

For large responses, use Lambda response streaming to send partial results to the client immediately, reducing time-to-first-byte. This is especially useful for LLM applications or large data exports where waiting for the full response would exceed timeout limits.

Testing Strategies

Local Testing with SAM CLI and Mocks

// __tests__/handler.test.ts
import { handler } from "../src/handlers/users";
import { mockClient } from "aws-sdk-client-mock";
import { DynamoDBClient, GetItemCommand } from "@aws-sdk/client-dynamodb";
 
const dynamoMock = mockClient(DynamoDBClient);
 
beforeEach(() => {
  dynamoMock.reset();
});
 
it("returns user from DynamoDB", async () => {
  dynamoMock.on(GetItemCommand).resolves({
    Item: { id: { S: "123" }, name: { S: "Alice" } },
  });
 
  const result = await handler({
    pathParameters: { userId: "123" },
  } as any);
 
  expect(result.statusCode).toBe(200);
  const body = JSON.parse(result.body);
  expect(body.name).toBe("Alice");
});
 
it("returns 404 for missing user", async () => {
  dynamoMock.on(GetItemCommand).resolves({ Item: undefined });
 
  const result = await handler({
    pathParameters: { userId: "999" },
  } as any);
 
  expect(result.statusCode).toBe(404);
});

Future Outlook

AWS continues to invest heavily in Lambda improvements. SnapStart expansion beyond Java, larger ephemeral storage options, and improved VPC networking with Hyperplane all point toward a future where Lambda's limitations shrink further. The Lambda URL feature and response streaming are closing the gap with traditional HTTP servers.

The serverless ecosystem is also maturing. Tools like SST, the AWS CDK, and Serverless Framework v4 make multi-function deployments manageable. Observability platforms like Lumigo, Datadog, and Thundra provide Lambda-specific insights that AWS CloudWatch alone cannot deliver at scale.

Looking ahead, the convergence of Lambda with container images (supporting up to 10 GB packages), GPU access for ML inference workloads, and deeper integration with Amazon Bedrock suggests Lambda will handle an increasingly diverse set of workloads beyond simple request-response patterns.

Cost Optimization Strategies

Lambda costs are primarily driven by invocation count and duration multiplied by memory allocation. Use the AWS Cost Explorer to identify your most expensive functions and focus optimization efforts there. Implement request batching to reduce invocation counts β€” process multiple records in a single Lambda execution rather than one per invocation. Choose ARM64 (Graviton2) architecture for 20% better price-performance compared to x86. Set appropriate timeout values to prevent runaway executions from accumulating costs. Use Lambda Destinations instead of synchronous invocations for event-driven workflows to avoid paying for waiting time.

Conclusion

Optimizing AWS Lambda is a discipline of measurement, not guesswork. The key takeaways are:

  1. Tune memory with Power Tuning before every production deploymentβ€”the ROI is immediate
  2. Minimize cold starts by bundling code, initializing at module scope, and using ARM64
  3. Use event-driven patterns with SQS, SNS, and EventBridge to decouple and scale
  4. Monitor everything with CloudWatch embedded metrics and Lambda Powertools
  5. Right-size concurrency using provisioned concurrency for latency-critical paths

Start by deploying AWS Lambda Power Tuning on your most expensive functions today. The results will guide every subsequent optimization. From there, adopt the architectural patterns that match your workload, and let Lambda's pay-per-invocation model work in your favor rather than against it.