OpenTelemetry: Distributed Tracing for Microservices

Introduction

When a user reports that their checkout is slow, where do you start looking? In a monolithic application, you would profile the checkout handler, examine database queries, and find the bottleneck. But in a microservices architecture, the checkout request might traverse an API gateway, an authentication service, a cart service, a pricing engine, an inventory service, a payment processor, a fraud detection service, and a notification service. Each of these services has its own logs, its own metrics, and its own error handling. Without distributed tracing, debugging a slow request across this many services is like searching for a needle in a haystack—blindfolded.

Distributed tracing solves this problem by recording the complete journey of a request across service boundaries. Each service adds its own timing information to a shared trace, creating a detailed waterfall chart that shows exactly where time was spent. OpenTelemetry (OTel) is the industry standard for implementing distributed tracing. Born from the merger of OpenTracing and OpenCensus, OTel provides vendor-neutral SDKs, a standardized data format, and a flexible Collector architecture that can export traces to any backend.

This guide covers the fundamentals of distributed tracing with OpenTelemetry, from understanding traces and spans to implementing context propagation across HTTP, gRPC, and message queue boundaries in a real microservices environment.

Understanding Distributed Tracing: Core Concepts

Traces, Spans, and Context

A trace represents the end-to-end journey of a single request through your distributed system. Every trace has a unique 128-bit trace ID that remains constant as the request flows from service to service. This trace ID is the glue that links together the work done by different services into a coherent picture.

A span is the primary building block of a trace. Each span represents a single unit of work—starting a database query, making an HTTP call, processing a message from a queue, or executing a function. A span contains several key pieces of information: a name describing the operation, start and end timestamps, a set of key-value attributes (called span attributes), a status code (OK, ERROR, or UNSET), and optional events (timestamped annotations within the span's lifetime).

Spans are organized in a parent-child hierarchy that mirrors the call chain across services. When Service A calls Service B, Service A creates a child span for the outgoing HTTP request, and Service B creates a child span for the incoming request processing. Both spans share the same trace ID, but each has its own span ID. The parent-child relationship is established through context propagation—the mechanism that carries trace context across process boundaries.

Span links provide an alternative to parent-child relationships for representing causal connections between spans in different traces. For example, a consumer span processing a message from a queue might link to the producer span that created the message, even though they are in separate traces.

Context Propagation

Context propagation is the most critical concept in distributed tracing. Without it, you would have isolated spans from each service with no way to connect them. Propagation works by serializing the trace context (trace ID, span ID, trace flags, and trace state) into a carrier format—typically HTTP headers—and including it in outgoing requests.

The W3C Trace Context standard defines two headers: traceparent (containing trace ID, span ID, and flags) and tracestate (vendor-specific data). OpenTelemetry defaults to W3C Trace Context but supports other propagation formats like B3 (used by Zipkin) and Jaeger for backward compatibility.

# W3C Trace Context headers
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
tracestate: congo=t61rcWkgMzE

The traceparent header format is: version-traceId-spanId-traceFlags. The 01 in traceFlags indicates the trace is sampled (should be recorded). The 00 would indicate it is not sampled.

Semantic Conventions

OpenTelemetry defines semantic conventions—standardized attribute names and values—that ensure consistency across different instrumentation libraries and languages. For HTTP spans, the conventions specify attributes like http.method, http.url, http.status_code, and http.request_content_length. For database spans, they specify db.system, db.statement, db.operation, and db.name.

Using semantic conventions is critical for interoperability. If your database spans use db.system: postgresql and your tracing backend is configured to display database system icons, you will automatically get the PostgreSQL icon. If you use a non-standard attribute name like database_engine, this automatic recognition breaks.

Architecture and Design Patterns

The OTel SDK Architecture

The OpenTelemetry SDK consists of several layers. The API layer defines the interfaces (Tracer, Span, Context, Propagator). The SDK layer provides the implementation—how spans are sampled, processed, and exported. The Contrib layer provides auto-instrumentation for popular libraries and frameworks. This separation allows you to swap implementations without changing your instrumentation code.

The TracerProvider is the entry point. It creates Tracers, which create Spans. Each Tracer is associated with a name (typically the library or module name) and a version. The TracerProvider also configures the pipeline: SpanProcessors receive completed spans, process them (batching, filtering, enrichment), and pass them to SpanExporters that send them to backends.

Auto-Instrumentation vs. Manual Instrumentation

Auto-instrumentation uses monkey-patching or bytecode manipulation to automatically create spans for common operations. For Node.js, the @opentelemetry/auto-instrumentations-node package instruments Express, Fastify, HTTP, PostgreSQL, Redis, MongoDB, and dozens of other libraries without any code changes. This provides a solid baseline of visibility with zero effort.

Manual instrumentation is needed for application-specific logic. When you have a complex business process that spans multiple function calls, you create custom spans to track the individual steps. This gives you fine-grained visibility into exactly where time is spent within your business logic.

Sampling Strategies

Collecting 100% of traces is impractical in high-throughput systems. A service handling 10,000 requests per second would generate millions of spans per minute. Sampling determines which traces to collect and which to drop.

Head-based sampling makes the decision at the root span (the first span in the trace). The decision propagates to all child spans through the trace flags. This is simple but has a blind spot: if a trace is dropped at the head, you cannot see if downstream services encountered errors.

Tail-based sampling collects all spans and makes the decision after the trace completes. This allows you to always keep traces with errors or high latency, while dropping a percentage of normal traces. The OTel Collector implements tail-based sampling, but it requires buffering all spans until the trace is complete, which adds memory overhead and latency.

Step-by-Step Implementation

Setting Up the TracerProvider

Configure the OTel SDK with a TracerProvider that exports traces to a Collector:

import { NodeTracerProvider } from '@opentelemetry/sdk-trace-node';
import { BatchSpanProcessor, SimpleSpanProcessor } from '@opentelemetry/sdk-trace-base';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-grpc';
import { Resource } from '@opentelemetry/resources';
import { ATTR_SERVICE_NAME, ATTR_SERVICE_VERSION } from '@opentelemetry/semantic-conventions';
import { W3CTraceContextPropagator } from '@opentelemetry/core';
import { registerInstrumentations } from '@opentelemetry/instrumentation';
import { HttpInstrumentation } from '@opentelemetry/instrumentation-http';
import { ExpressInstrumentation } from '@opentelemetry/instrumentation-express';
import { PgInstrumentation } from '@opentelemetry/instrumentation-pg';
import { RedisInstrumentation } from '@opentelemetry/instrumentation-redis-4';
import { propagation, trace } from '@opentelemetry/api';
 
const provider = new NodeTracerProvider({
  resource: new Resource({
    [ATTR_SERVICE_NAME]: 'order-service',
    [ATTR_SERVICE_VERSION]: process.env.APP_VERSION || '1.0.0',
    'deployment.environment': process.env.NODE_ENV || 'development',
  }),
});
 
const exporter = new OTLPTraceExporter({
  url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://localhost:4317',
});
 
// Use BatchSpanProcessor for production (buffers spans for efficient export)
provider.addSpanProcessor(new BatchSpanProcessor(exporter, {
  maxQueueSize: 2048,
  maxExportBatchSize: 512,
  scheduledDelayMillis: 5000,
  exportTimeoutMillis: 30000,
}));
 
// Use SimpleSpanProcessor in development for immediate export
if (process.env.NODE_ENV === 'development') {
  provider.addSpanProcessor(new SimpleSpanProcessor(exporter));
}
 
provider.register({ propagator: new W3CTraceContextPropagator() });
 
registerInstrumentations({
  instrumentations: [
    new HttpInstrumentation({
      ignoreIncomingPaths: ['/healthz', '/readyz', '/metrics'],
    }),
    new ExpressInstrumentation(),
    new PgInstrumentation({ enhancedDatabaseReporting: true }),
    new RedisInstrumentation(),
  ],
});
 
const tracer = trace.getTracer('order-service');

Manual Span Creation for Business Logic

Create custom spans to trace specific business operations:

import { SpanStatusCode, context, trace } from '@opentelemetry/api';
 
async function processOrder(order: Order): Promise<OrderResult> {
  const tracer = trace.getTracer('order-service');
 
  return tracer.startActiveSpan('processOrder', {
    attributes: {
      'order.id': order.id,
      'order.total': order.total,
      'order.item_count': order.items.length,
      'customer.id': order.customerId,
    },
  }, async (span) => {
    try {
      // Validate inventory
      const inventory = await tracer.startActiveSpan('validateInventory', async (invSpan) => {
        try {
          const result = await inventoryService.check(order.items);
          invSpan.setAttribute('inventory.all_available', result.allAvailable);
          invSpan.setAttribute('inventory.missing_items', result.missingItems.length);
          return result;
        } catch (error) {
          invSpan.setStatus({ code: SpanStatusCode.ERROR, message: error.message });
          invSpan.recordException(error);
          throw error;
        } finally {
          invSpan.end();
        }
      });
 
      if (!inventory.allAvailable) {
        span.setStatus({ code: SpanStatusCode.ERROR, message: 'Items out of stock' });
        return { success: false, reason: 'out_of_stock', missingItems: inventory.missingItems };
      }
 
      // Process payment
      const payment = await tracer.startActiveSpan('processPayment', async (paySpan) => {
        paySpan.setAttribute('payment.method', order.paymentMethod);
        paySpan.setAttribute('payment.amount', order.total);
        try {
          const result = await paymentService.charge(order);
          paySpan.setAttribute('payment.transaction_id', result.transactionId);
          return result;
        } catch (error) {
          paySpan.setStatus({ code: SpanStatusCode.ERROR, message: error.message });
          throw error;
        } finally {
          paySpan.end();
        }
      });
 
      span.setAttribute('order.payment_transaction_id', payment.transactionId);
      span.setStatus({ code: SpanStatusCode.OK });
      return { success: true, orderId: order.id, transactionId: payment.transactionId };
    } catch (error) {
      span.setStatus({ code: SpanStatusCode.ERROR, message: error.message });
      span.recordException(error);
      throw error;
    } finally {
      span.end();
    }
  });
}

Cross-Service Context Propagation

When making HTTP requests to other services, the OTel HTTP instrumentation automatically propagates the trace context. For message queues, you need to manually propagate the context:

import { propagation, context, trace } from '@opentelemetry/api';
 
// Publishing a message with trace context
async function publishOrderEvent(order: Order, event: string) {
  const tracer = trace.getTracer('order-service');
  return tracer.startActiveSpan('publishOrderEvent', async (span) => {
    span.setAttribute('messaging.system', 'rabbitmq');
    span.setAttribute('messaging.destination', 'order-events');
    span.setAttribute('messaging.operation', 'publish');
 
    const headers: Record<string, string> = {};
    // Inject trace context into message headers
    propagation.inject(context.active(), headers);
 
    await messageQueue.publish('order-events', {
      type: event,
      data: order,
      _traceHeaders: headers, // Carried alongside the message payload
    });
    span.end();
  });
}
 
// Consuming a message with trace context extraction
async function handleOrderEvent(message: Message) {
  // Extract trace context from message headers
  const parentContext = propagation.extract(context.active(), message._traceHeaders);
  const tracer = trace.getTracer('notification-service');
 
  return context.with(parentContext, () => {
    tracer.startActiveSpan('handleOrderEvent', {
      attributes: {
        'messaging.system': 'rabbitmq',
        'messaging.destination': 'order-events',
        'messaging.operation': 'process',
      },
    }, async (span) => {
      span.setAttribute('event.type', message.type);
      await sendNotification(message.data);
      span.end();
    });
  });
}

Real-World Use Cases and Case Studies

Use Case 1: Identifying N+1 Query Problems

A team notices their order listing page takes 3 seconds to load. They enable distributed tracing and discover that the order service makes a single query to fetch 50 orders, then makes an individual query for each order's customer details—51 database queries instead of 2. The trace waterfall clearly shows 50 sequential database spans, each taking 20-40ms, all triggered from a single HTTP request span. The fix involves adding a JOIN or batch-loading pattern, reducing the page load from 3 seconds to 120ms.

Use Case 2: Debugging Timeout Cascades

An intermittent timeout in the payment service causes a cascade of retries through the order pipeline. By examining traces, the team discovers that the payment service has a 5-second timeout, but the upstream order service has a 3-second timeout. When payment is slow, the order service times out first, retries, and creates duplicate payment requests. The trace makes this ordering dependency visible, and the team fixes it by making the order timeout longer than the payment timeout and implementing idempotency keys.

Use Case 3: Measuring Service Mesh Overhead

A team migrating to a service mesh (Istio) uses distributed traces to measure the overhead added by the mesh's sidecar proxies. By comparing span durations before and after the migration, they discover the mesh adds 5-15ms per hop. For requests that traverse 6 services, this adds 30-90ms—acceptable for most endpoints but problematic for a latency-sensitive search API. They configure Istio to bypass the mesh for the search path, maintaining their latency SLA.

Best Practices for Production

Use semantic conventions for all attribute names: Follow the OTel semantic conventions for HTTP, database, messaging, and RPC attributes. This ensures your traces work with any tracing backend and are searchable in standardized ways.
Always end spans in finally blocks: Unfinished spans leak memory and create confusing trace visualizations. Wrap span logic in try-finally to guarantee span.end() is called even if an exception occurs.
Set span status for errors: When an operation fails, set the span status to ERROR and record the exception. This makes it easy to filter for failed traces in your tracing backend and ensures error-based sampling works correctly.
Use BatchSpanProcessor in production: SimpleSpanProcessor exports each span immediately, which creates high network overhead. BatchSpanProcessor buffers spans and exports them in batches, reducing network calls by orders of magnitude.
Implement graceful shutdown: When your application receives SIGTERM, call tracerProvider.shutdown() to flush any buffered spans. Without this, spans from in-flight requests are lost during deployments.
Keep span names low cardinality: Span names should be operation names, not unique identifiers. Use GET /orders/:id instead of GET /orders/12345. High-cardinality span names make it difficult to aggregate and search traces.
Use span links for async relationships: When processing messages from a queue, use span links to connect to the producing span rather than creating a parent-child relationship. This keeps each trace focused on a single request flow.
Sample based on your needs: For development, sample 100%. For production, use tail-based sampling to keep all errors and high-latency traces while sampling 1-10% of normal traffic.

Common Pitfalls and Solutions

Pitfall	Impact	Solution
Forgetting to propagate context across async boundaries	Broken traces with missing spans	Always inject/extract context for message queues, event emitters, and async boundaries
Using high-cardinality values as span attributes	Memory explosion in SDK and backend	Use bounded attribute values; store IDs in logs with trace correlation
Not shutting down the SDK on application exit	Lost spans during deployments	Register SIGTERM handler that calls `provider.shutdown()`
Creating too many nested spans	Performance overhead, noisy traces	Instrument at the service and significant operation level, not every function
Missing error recording on failed spans	Incomplete traces during debugging	Always set ERROR status and call `span.recordException()` on failures
Mixing up propagation formats across services	Trace context not recognized	Standardize on W3C TraceContext across all services

Performance Optimization

Reducing Span Export Overhead

Configure the BatchSpanProcessor to balance latency against overhead:

provider.addSpanProcessor(new BatchSpanProcessor(exporter, {
  maxQueueSize: 4096,           // Buffer up to 4096 spans before dropping
  maxExportBatchSize: 512,      // Send 512 spans per export batch
  scheduledDelayMillis: 5000,   // Export every 5 seconds
  exportTimeoutMillis: 30000,   // Give up after 30 seconds
}));

For extremely high-throughput services, consider using the OTel Collector as an intermediary. The Collector can batch spans from multiple service instances and export them efficiently to the tracing backend.

Sampling Configuration in the Collector

processors:
  tail_sampling:
    decision_wait: 30s
    num_traces: 100000
    expected_new_traces_per_sec: 1000
    policies:
      - name: errors
        type: status_code
        status_code: { status_codes: [ERROR] }
      - name: slow-requests
        type: latency
        latency: { threshold_ms: 1000 }
      - name: probabilistic
        type: probabilistic
        probabilistic: { sampling_percentage: 5 }

Comparison with Alternatives

Feature	OpenTelemetry	Jaeger	Zipkin	Datadog APM	New Relic
Vendor Neutral	Yes	Partial	Partial	No	No
Auto-instrumentation	Extensive (contrib)	Limited	Limited	SDK-based	SDK-based
Sampling	Head + tail (Collector)	Head-based	Head-based	Adaptive	Adaptive
Storage	Any OTLP-compatible	Elasticsearch, Cassandra	Elasticsearch	Proprietary	Proprietary
Context Propagation	W3C + B3 + Jaeger	B3 + Jaeger	B3	W3C + Datadog	W3C + New Relic
Cost	Open-source	Open-source	Open-source	Per-host	Per-GB
Metrics + Logs + Traces	All three	Traces only	Traces only	All three	All three

Advanced Patterns and Techniques

Span Events for Rich Annotations

Use span events to record significant moments within a span's lifetime:

tracer.startActiveSpan('processPayment', async (span) => {
  span.addEvent('payment.initiated', {
    'payment.provider': 'stripe',
    'payment.amount': 99.99,
  });
 
  const result = await stripe.charges.create({ amount: 9999, currency: 'usd' });
 
  span.addEvent('payment.completed', {
    'payment.charge_id': result.id,
    'payment.receipt_url': result.receipt_url,
  });
  span.end();
});

Baggage for Cross-Cut Data

OpenTelemetry Baggage propagates arbitrary key-value pairs across service boundaries. Use it for cross-cutting concerns like tenant IDs, feature flags, or A/B test assignments:

import { propagation, context, baggage } from '@opentelemetry/api';
 
// Set baggage at the edge
const bag = baggage.createBaggage({ 'tenant.id': 'acme-corp', 'feature.new_checkout': 'true' });
const ctx = propagation.setBaggage(context.active(), bag);
 
// Retrieve baggage in any downstream service
const activeBaggage = propagation.getBaggage(context.active());
const tenantId = activeBaggage?.getEntry('tenant.id')?.value;

Testing Strategies

Test your tracing implementation by verifying spans are created and exported correctly:

import { InMemorySpanExporter, SimpleSpanProcessor } from '@opentelemetry/sdk-trace-base';
 
const memoryExporter = new InMemorySpanExporter();
provider.addSpanProcessor(new SimpleSpanProcessor(memoryExporter));
 
describe('Order Processing Tracing', () => {
  beforeEach(() => memoryExporter.reset());
 
  it('should create spans for the full order flow', async () => {
    await processOrder(testOrder);
    const spans = memoryExporter.getFinishedSpans();
 
    expect(spans.map(s => s.name)).toEqual([
      'processOrder', 'validateInventory', 'processPayment'
    ]);
 
    const rootSpan = spans.find(s => s.name === 'processOrder');
    expect(rootSpan.attributes['order.id']).toBe(testOrder.id);
    expect(rootSpan.status.code).toBe(SpanStatusCode.OK);
  });
 
  it('should record errors in spans', async () => {
    mockPaymentService.charge.mockRejectedValue(new Error('Card declined'));
    await expect(processOrder(testOrder)).rejects.toThrow();
 
    const spans = memoryExporter.getFinishedSpans();
    const paymentSpan = spans.find(s => s.name === 'processPayment');
    expect(paymentSpan.status.code).toBe(SpanStatusCode.ERROR);
    expect(paymentSpan.events[0].name).toBe('exception');
  });
});

Future Outlook

OpenTelemetry continues to evolve rapidly. The recent addition of profiling as a fourth signal type (alongside traces, metrics, and logs) will enable correlating CPU and memory profiles with distributed traces. The OTel Collector's growing ecosystem of processors—including adaptive sampling, attribute transformation, and tail-based filtering—makes it increasingly capable as a central telemetry pipeline.

The convergence of OpenTelemetry with eBPF-based tools promises automatic, zero-code instrumentation for any application, regardless of language or framework. Projects like Grafana Beyla and Pixie use eBPF to intercept network traffic and system calls, generating traces without any SDK integration.

Conclusion

Distributed tracing with OpenTelemetry transforms how you understand and debug microservices architectures. By recording the complete journey of requests across service boundaries, you gain visibility into latency bottlenecks, error cascades, and service dependencies that are invisible with traditional monitoring.

Key takeaways:

Traces, spans, and context propagation are the three fundamental concepts to master
Use W3C TraceContext for propagation to ensure vendor interoperability
Auto-instrumentation provides a zero-effort baseline; add manual spans for business logic
Always propagate context across async boundaries like message queues
Sample intelligently—keep all errors and slow traces, sample normal traffic
Use semantic conventions for attribute names to maximize interoperability
Test your tracing implementation as rigorously as your business logic

Start with auto-instrumentation for your HTTP framework and database driver, then incrementally add manual spans for critical business flows. The investment in distributed tracing pays dividends during every debugging session.

Minh Vo

Slaying code & making it lit fr fr 🔥 tagline