Introduction
When a user reports that their checkout is slow, where do you start looking? In a monolithic application, you would profile the checkout handler, examine database queries, and find the bottleneck. But in a microservices architecture, the checkout request might traverse an API gateway, an authentication service, a cart service, a pricing engine, an inventory service, a payment processor, a fraud detection service, and a notification service. Each of these services has its own logs, its own metrics, and its own error handling. Without distributed tracing, debugging a slow request across this many services is like searching for a needle in a haystack—blindfolded.
Distributed tracing solves this problem by recording the complete journey of a request across service boundaries. Each service adds its own timing information to a shared trace, creating a detailed waterfall chart that shows exactly where time was spent. OpenTelemetry (OTel) is the industry standard for implementing distributed tracing. Born from the merger of OpenTracing and OpenCensus, OTel provides vendor-neutral SDKs, a standardized data format, and a flexible Collector architecture that can export traces to any backend.
This guide covers the fundamentals of distributed tracing with OpenTelemetry, from understanding traces and spans to implementing context propagation across HTTP, gRPC, and message queue boundaries in a real microservices environment.
Understanding Distributed Tracing: Core Concepts
Traces, Spans, and Context
A trace represents the end-to-end journey of a single request through your distributed system. Every trace has a unique 128-bit trace ID that remains constant as the request flows from service to service. This trace ID is the glue that links together the work done by different services into a coherent picture.
A span is the primary building block of a trace. Each span represents a single unit of work—starting a database query, making an HTTP call, processing a message from a queue, or executing a function. A span contains several key pieces of information: a name describing the operation, start and end timestamps, a set of key-value attributes (called span attributes), a status code (OK, ERROR, or UNSET), and optional events (timestamped annotations within the span's lifetime).
Spans are organized in a parent-child hierarchy that mirrors the call chain across services. When Service A calls Service B, Service A creates a child span for the outgoing HTTP request, and Service B creates a child span for the incoming request processing. Both spans share the same trace ID, but each has its own span ID. The parent-child relationship is established through context propagation—the mechanism that carries trace context across process boundaries.
Span links provide an alternative to parent-child relationships for representing causal connections between spans in different traces. For example, a consumer span processing a message from a queue might link to the producer span that created the message, even though they are in separate traces.
Context Propagation
Context propagation is the most critical concept in distributed tracing. Without it, you would have isolated spans from each service with no way to connect them. Propagation works by serializing the trace context (trace ID, span ID, trace flags, and trace state) into a carrier format—typically HTTP headers—and including it in outgoing requests.
The W3C Trace Context standard defines two headers: traceparent (containing trace ID, span ID, and flags) and tracestate (vendor-specific data). OpenTelemetry defaults to W3C Trace Context but supports other propagation formats like B3 (used by Zipkin) and Jaeger for backward compatibility.
# W3C Trace Context headers
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
tracestate: congo=t61rcWkgMzEThe traceparent header format is: version-traceId-spanId-traceFlags. The 01 in traceFlags indicates the trace is sampled (should be recorded). The 00 would indicate it is not sampled.
Semantic Conventions
OpenTelemetry defines semantic conventions—standardized attribute names and values—that ensure consistency across different instrumentation libraries and languages. For HTTP spans, the conventions specify attributes like http.method, http.url, http.status_code, and http.request_content_length. For database spans, they specify db.system, db.statement, db.operation, and db.name.
Using semantic conventions is critical for interoperability. If your database spans use db.system: postgresql and your tracing backend is configured to display database system icons, you will automatically get the PostgreSQL icon. If you use a non-standard attribute name like database_engine, this automatic recognition breaks.
Architecture and Design Patterns
The OTel SDK Architecture
The OpenTelemetry SDK consists of several layers. The API layer defines the interfaces (Tracer, Span, Context, Propagator). The SDK layer provides the implementation—how spans are sampled, processed, and exported. The Contrib layer provides auto-instrumentation for popular libraries and frameworks. This separation allows you to swap implementations without changing your instrumentation code.
The TracerProvider is the entry point. It creates Tracers, which create Spans. Each Tracer is associated with a name (typically the library or module name) and a version. The TracerProvider also configures the pipeline: SpanProcessors receive completed spans, process them (batching, filtering, enrichment), and pass them to SpanExporters that send them to backends.
Auto-Instrumentation vs. Manual Instrumentation
Auto-instrumentation uses monkey-patching or bytecode manipulation to automatically create spans for common operations. For Node.js, the @opentelemetry/auto-instrumentations-node package instruments Express, Fastify, HTTP, PostgreSQL, Redis, MongoDB, and dozens of other libraries without any code changes. This provides a solid baseline of visibility with zero effort.
Manual instrumentation is needed for application-specific logic. When you have a complex business process that spans multiple function calls, you create custom spans to track the individual steps. This gives you fine-grained visibility into exactly where time is spent within your business logic.
Sampling Strategies
Collecting 100% of traces is impractical in high-throughput systems. A service handling 10,000 requests per second would generate millions of spans per minute. Sampling determines which traces to collect and which to drop.
Head-based sampling makes the decision at the root span (the first span in the trace). The decision propagates to all child spans through the trace flags. This is simple but has a blind spot: if a trace is dropped at the head, you cannot see if downstream services encountered errors.
Tail-based sampling collects all spans and makes the decision after the trace completes. This allows you to always keep traces with errors or high latency, while dropping a percentage of normal traces. The OTel Collector implements tail-based sampling, but it requires buffering all spans until the trace is complete, which adds memory overhead and latency.
Step-by-Step Implementation
Setting Up the TracerProvider
Configure the OTel SDK with a TracerProvider that exports traces to a Collector:
import { NodeTracerProvider } from '@opentelemetry/sdk-trace-node';
import { BatchSpanProcessor, SimpleSpanProcessor } from '@opentelemetry/sdk-trace-base';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-grpc';
import { Resource } from '@opentelemetry/resources';
import { ATTR_SERVICE_NAME, ATTR_SERVICE_VERSION } from '@opentelemetry/semantic-conventions';
import { W3CTraceContextPropagator } from '@opentelemetry/core';
import { registerInstrumentations } from '@opentelemetry/instrumentation';
import { HttpInstrumentation } from '@opentelemetry/instrumentation-http';
import { ExpressInstrumentation } from '@opentelemetry/instrumentation-express';
import { PgInstrumentation } from '@opentelemetry/instrumentation-pg';
import { RedisInstrumentation } from '@opentelemetry/instrumentation-redis-4';
import { propagation, trace } from '@opentelemetry/api';
const provider = new NodeTracerProvider({
resource: new Resource({
[ATTR_SERVICE_NAME]: 'order-service',
[ATTR_SERVICE_VERSION]: process.env.APP_VERSION || '1.0.0',
'deployment.environment': process.env.NODE_ENV || 'development',
}),
});
const exporter = new OTLPTraceExporter({
url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://localhost:4317',
});
// Use BatchSpanProcessor for production (buffers spans for efficient export)
provider.addSpanProcessor(new BatchSpanProcessor(exporter, {
maxQueueSize: 2048,
maxExportBatchSize: 512,
scheduledDelayMillis: 5000,
exportTimeoutMillis: 30000,
}));
// Use SimpleSpanProcessor in development for immediate export
if (process.env.NODE_ENV === 'development') {
provider.addSpanProcessor(new SimpleSpanProcessor(exporter));
}
provider.register({ propagator: new W3CTraceContextPropagator() });
registerInstrumentations({
instrumentations: [
new HttpInstrumentation({
ignoreIncomingPaths: ['/healthz', '/readyz', '/metrics'],
}),
new ExpressInstrumentation(),
new PgInstrumentation({ enhancedDatabaseReporting: true }),
new RedisInstrumentation(),
],
});
const tracer = trace.getTracer('order-service');Manual Span Creation for Business Logic
Create custom spans to trace specific business operations:
import { SpanStatusCode, context, trace } from '@opentelemetry/api';
async function processOrder(order: Order): Promise<OrderResult> {
const tracer = trace.getTracer('order-service');
return tracer.startActiveSpan('processOrder', {
attributes: {
'order.id': order.id,
'order.total': order.total,
'order.item_count': order.items.length,
'customer.id': order.customerId,
},
}, async (span) => {
try {
// Validate inventory
const inventory = await tracer.startActiveSpan('validateInventory', async (invSpan) => {
try {
const result = await inventoryService.check(order.items);
invSpan.setAttribute('inventory.all_available', result.allAvailable);
invSpan.setAttribute('inventory.missing_items', result.missingItems.length);
return result;
} catch (error) {
invSpan.setStatus({ code: SpanStatusCode.ERROR, message: error.message });
invSpan.recordException(error);
throw error;
} finally {
invSpan.end();
}
});
if (!inventory.allAvailable) {
span.setStatus({ code: SpanStatusCode.ERROR, message: 'Items out of stock' });
return { success: false, reason: 'out_of_stock', missingItems: inventory.missingItems };
}
// Process payment
const payment = await tracer.startActiveSpan('processPayment', async (paySpan) => {
paySpan.setAttribute('payment.method', order.paymentMethod);
paySpan.setAttribute('payment.amount', order.total);
try {
const result = await paymentService.charge(order);
paySpan.setAttribute('payment.transaction_id', result.transactionId);
return result;
} catch (error) {
paySpan.setStatus({ code: SpanStatusCode.ERROR, message: error.message });
throw error;
} finally {
paySpan.end();
}
});
span.setAttribute('order.payment_transaction_id', payment.transactionId);
span.setStatus({ code: SpanStatusCode.OK });
return { success: true, orderId: order.id, transactionId: payment.transactionId };
} catch (error) {
span.setStatus({ code: SpanStatusCode.ERROR, message: error.message });
span.recordException(error);
throw error;
} finally {
span.end();
}
});
}Cross-Service Context Propagation
When making HTTP requests to other services, the OTel HTTP instrumentation automatically propagates the trace context. For message queues, you need to manually propagate the context:
import { propagation, context, trace } from '@opentelemetry/api';
// Publishing a message with trace context
async function publishOrderEvent(order: Order, event: string) {
const tracer = trace.getTracer('order-service');
return tracer.startActiveSpan('publishOrderEvent', async (span) => {
span.setAttribute('messaging.system', 'rabbitmq');
span.setAttribute('messaging.destination', 'order-events');
span.setAttribute('messaging.operation', 'publish');
const headers: Record<string, string> = {};
// Inject trace context into message headers
propagation.inject(context.active(), headers);
await messageQueue.publish('order-events', {
type: event,
data: order,
_traceHeaders: headers, // Carried alongside the message payload
});
span.end();
});
}
// Consuming a message with trace context extraction
async function handleOrderEvent(message: Message) {
// Extract trace context from message headers
const parentContext = propagation.extract(context.active(), message._traceHeaders);
const tracer = trace.getTracer('notification-service');
return context.with(parentContext, () => {
tracer.startActiveSpan('handleOrderEvent', {
attributes: {
'messaging.system': 'rabbitmq',
'messaging.destination': 'order-events',
'messaging.operation': 'process',
},
}, async (span) => {
span.setAttribute('event.type', message.type);
await sendNotification(message.data);
span.end();
});
});
}Real-World Use Cases and Case Studies
Use Case 1: Identifying N+1 Query Problems
A team notices their order listing page takes 3 seconds to load. They enable distributed tracing and discover that the order service makes a single query to fetch 50 orders, then makes an individual query for each order's customer details—51 database queries instead of 2. The trace waterfall clearly shows 50 sequential database spans, each taking 20-40ms, all triggered from a single HTTP request span. The fix involves adding a JOIN or batch-loading pattern, reducing the page load from 3 seconds to 120ms.
Use Case 2: Debugging Timeout Cascades
An intermittent timeout in the payment service causes a cascade of retries through the order pipeline. By examining traces, the team discovers that the payment service has a 5-second timeout, but the upstream order service has a 3-second timeout. When payment is slow, the order service times out first, retries, and creates duplicate payment requests. The trace makes this ordering dependency visible, and the team fixes it by making the order timeout longer than the payment timeout and implementing idempotency keys.
Use Case 3: Measuring Service Mesh Overhead
A team migrating to a service mesh (Istio) uses distributed traces to measure the overhead added by the mesh's sidecar proxies. By comparing span durations before and after the migration, they discover the mesh adds 5-15ms per hop. For requests that traverse 6 services, this adds 30-90ms—acceptable for most endpoints but problematic for a latency-sensitive search API. They configure Istio to bypass the mesh for the search path, maintaining their latency SLA.
Best Practices for Production
-
Use semantic conventions for all attribute names: Follow the OTel semantic conventions for HTTP, database, messaging, and RPC attributes. This ensures your traces work with any tracing backend and are searchable in standardized ways.
-
Always end spans in finally blocks: Unfinished spans leak memory and create confusing trace visualizations. Wrap span logic in try-finally to guarantee
span.end()is called even if an exception occurs. -
Set span status for errors: When an operation fails, set the span status to ERROR and record the exception. This makes it easy to filter for failed traces in your tracing backend and ensures error-based sampling works correctly.
-
Use BatchSpanProcessor in production: SimpleSpanProcessor exports each span immediately, which creates high network overhead. BatchSpanProcessor buffers spans and exports them in batches, reducing network calls by orders of magnitude.
-
Implement graceful shutdown: When your application receives SIGTERM, call
tracerProvider.shutdown()to flush any buffered spans. Without this, spans from in-flight requests are lost during deployments. -
Keep span names low cardinality: Span names should be operation names, not unique identifiers. Use
GET /orders/:idinstead ofGET /orders/12345. High-cardinality span names make it difficult to aggregate and search traces. -
Use span links for async relationships: When processing messages from a queue, use span links to connect to the producing span rather than creating a parent-child relationship. This keeps each trace focused on a single request flow.
-
Sample based on your needs: For development, sample 100%. For production, use tail-based sampling to keep all errors and high-latency traces while sampling 1-10% of normal traffic.
Common Pitfalls and Solutions
| Pitfall | Impact | Solution |
|---|---|---|
| Forgetting to propagate context across async boundaries | Broken traces with missing spans | Always inject/extract context for message queues, event emitters, and async boundaries |
| Using high-cardinality values as span attributes | Memory explosion in SDK and backend | Use bounded attribute values; store IDs in logs with trace correlation |
| Not shutting down the SDK on application exit | Lost spans during deployments | Register SIGTERM handler that calls provider.shutdown() |
| Creating too many nested spans | Performance overhead, noisy traces | Instrument at the service and significant operation level, not every function |
| Missing error recording on failed spans | Incomplete traces during debugging | Always set ERROR status and call span.recordException() on failures |
| Mixing up propagation formats across services | Trace context not recognized | Standardize on W3C TraceContext across all services |
Performance Optimization
Reducing Span Export Overhead
Configure the BatchSpanProcessor to balance latency against overhead:
provider.addSpanProcessor(new BatchSpanProcessor(exporter, {
maxQueueSize: 4096, // Buffer up to 4096 spans before dropping
maxExportBatchSize: 512, // Send 512 spans per export batch
scheduledDelayMillis: 5000, // Export every 5 seconds
exportTimeoutMillis: 30000, // Give up after 30 seconds
}));For extremely high-throughput services, consider using the OTel Collector as an intermediary. The Collector can batch spans from multiple service instances and export them efficiently to the tracing backend.
Sampling Configuration in the Collector
processors:
tail_sampling:
decision_wait: 30s
num_traces: 100000
expected_new_traces_per_sec: 1000
policies:
- name: errors
type: status_code
status_code: { status_codes: [ERROR] }
- name: slow-requests
type: latency
latency: { threshold_ms: 1000 }
- name: probabilistic
type: probabilistic
probabilistic: { sampling_percentage: 5 }Comparison with Alternatives
| Feature | OpenTelemetry | Jaeger | Zipkin | Datadog APM | New Relic |
|---|---|---|---|---|---|
| Vendor Neutral | Yes | Partial | Partial | No | No |
| Auto-instrumentation | Extensive (contrib) | Limited | Limited | SDK-based | SDK-based |
| Sampling | Head + tail (Collector) | Head-based | Head-based | Adaptive | Adaptive |
| Storage | Any OTLP-compatible | Elasticsearch, Cassandra | Elasticsearch | Proprietary | Proprietary |
| Context Propagation | W3C + B3 + Jaeger | B3 + Jaeger | B3 | W3C + Datadog | W3C + New Relic |
| Cost | Open-source | Open-source | Open-source | Per-host | Per-GB |
| Metrics + Logs + Traces | All three | Traces only | Traces only | All three | All three |
Advanced Patterns and Techniques
Span Events for Rich Annotations
Use span events to record significant moments within a span's lifetime:
tracer.startActiveSpan('processPayment', async (span) => {
span.addEvent('payment.initiated', {
'payment.provider': 'stripe',
'payment.amount': 99.99,
});
const result = await stripe.charges.create({ amount: 9999, currency: 'usd' });
span.addEvent('payment.completed', {
'payment.charge_id': result.id,
'payment.receipt_url': result.receipt_url,
});
span.end();
});Baggage for Cross-Cut Data
OpenTelemetry Baggage propagates arbitrary key-value pairs across service boundaries. Use it for cross-cutting concerns like tenant IDs, feature flags, or A/B test assignments:
import { propagation, context, baggage } from '@opentelemetry/api';
// Set baggage at the edge
const bag = baggage.createBaggage({ 'tenant.id': 'acme-corp', 'feature.new_checkout': 'true' });
const ctx = propagation.setBaggage(context.active(), bag);
// Retrieve baggage in any downstream service
const activeBaggage = propagation.getBaggage(context.active());
const tenantId = activeBaggage?.getEntry('tenant.id')?.value;Testing Strategies
Test your tracing implementation by verifying spans are created and exported correctly:
import { InMemorySpanExporter, SimpleSpanProcessor } from '@opentelemetry/sdk-trace-base';
const memoryExporter = new InMemorySpanExporter();
provider.addSpanProcessor(new SimpleSpanProcessor(memoryExporter));
describe('Order Processing Tracing', () => {
beforeEach(() => memoryExporter.reset());
it('should create spans for the full order flow', async () => {
await processOrder(testOrder);
const spans = memoryExporter.getFinishedSpans();
expect(spans.map(s => s.name)).toEqual([
'processOrder', 'validateInventory', 'processPayment'
]);
const rootSpan = spans.find(s => s.name === 'processOrder');
expect(rootSpan.attributes['order.id']).toBe(testOrder.id);
expect(rootSpan.status.code).toBe(SpanStatusCode.OK);
});
it('should record errors in spans', async () => {
mockPaymentService.charge.mockRejectedValue(new Error('Card declined'));
await expect(processOrder(testOrder)).rejects.toThrow();
const spans = memoryExporter.getFinishedSpans();
const paymentSpan = spans.find(s => s.name === 'processPayment');
expect(paymentSpan.status.code).toBe(SpanStatusCode.ERROR);
expect(paymentSpan.events[0].name).toBe('exception');
});
});Future Outlook
OpenTelemetry continues to evolve rapidly. The recent addition of profiling as a fourth signal type (alongside traces, metrics, and logs) will enable correlating CPU and memory profiles with distributed traces. The OTel Collector's growing ecosystem of processors—including adaptive sampling, attribute transformation, and tail-based filtering—makes it increasingly capable as a central telemetry pipeline.
The convergence of OpenTelemetry with eBPF-based tools promises automatic, zero-code instrumentation for any application, regardless of language or framework. Projects like Grafana Beyla and Pixie use eBPF to intercept network traffic and system calls, generating traces without any SDK integration.
Conclusion
Distributed tracing with OpenTelemetry transforms how you understand and debug microservices architectures. By recording the complete journey of requests across service boundaries, you gain visibility into latency bottlenecks, error cascades, and service dependencies that are invisible with traditional monitoring.
Key takeaways:
- Traces, spans, and context propagation are the three fundamental concepts to master
- Use W3C TraceContext for propagation to ensure vendor interoperability
- Auto-instrumentation provides a zero-effort baseline; add manual spans for business logic
- Always propagate context across async boundaries like message queues
- Sample intelligently—keep all errors and slow traces, sample normal traffic
- Use semantic conventions for attribute names to maximize interoperability
- Test your tracing implementation as rigorously as your business logic
Start with auto-instrumentation for your HTTP framework and database driver, then incrementally add manual spans for critical business flows. The investment in distributed tracing pays dividends during every debugging session.