Introduction
Modern distributed systems face a fundamental challenge: coordinating long-running business processes across multiple services while handling failures gracefully. A payment flow might involve checking inventory, charging a credit card, updating order status, sending a confirmation email, and notifying a warehouse—each step calling a different service. If the email service goes down after the payment succeeds, what happens? If the warehouse notification fails, do you refund the payment? These questions lead developers into the treacherous world of distributed transactions, compensating actions, and idempotency concerns.
Temporal.io solves this problem by providing durable workflow execution. Instead of orchestrating services with fragile message queues and retry logic, you write workflows as regular code—functions that call other functions. Temporal ensures that these workflows execute to completion, even if servers crash, networks partition, or processes restart. If a workflow is interrupted at step three of five, Temporal resumes it from step three when the system recovers. No data is lost, no step is skipped, and no step is executed twice.
The core innovation of Temporal is "workflow as code." Unlike traditional workflow engines that use XML, YAML, or visual designers to define workflows, Temporal lets you write workflows in Go, Java, TypeScript, Python, or PHP. This means you get the full power of your programming language—loops, conditionals, error handling, type checking, and testing frameworks—combined with the durability guarantees of a workflow engine.
This guide covers everything from basic workflow concepts to advanced patterns like sagas, child workflows, and production deployment. We will explore the architecture that makes Temporal unique, walk through real-world implementations, and discuss the trade-offs that come with durable execution.
Understanding Temporal: Core Concepts
Workflows and Activities
Temporal separates two types of code: workflows and activities. Workflows are deterministic functions that define the business logic—the sequence of steps, the branching conditions, and the error handling. Activities are non-deterministic functions that perform side effects—calling APIs, reading databases, sending emails, or accessing external systems.
This separation is critical. Temporal replays workflow code to recover state after failures. If workflow code is non-deterministic (e.g., it calls a random number generator or checks the current time), replay produces different results, breaking the system. By moving all side effects into activities, workflows remain deterministic while activities can do anything.
// workflows.ts - Deterministic workflow code
import { proxyActivities, sleep, condition } from '@temporalio/workflow';
import type * as activities from './activities';
const { chargePayment, sendEmail, updateInventory, notifyWarehouse } =
proxyActivities<typeof activities>({
startToCloseTimeout: '5 minutes',
retry: {
maximumAttempts: 3,
initialInterval: '1 second',
backoffCoefficient: 2,
},
});
export async function processOrder(orderId: string): Promise<OrderResult> {
const order = await getOrder(orderId);
// Step 1: Check and reserve inventory
const reserved = await updateInventory(order.items, 'reserve');
if (!reserved) {
return { status: 'failed', reason: 'out_of_stock' };
}
try {
// Step 2: Charge payment
const payment = await chargePayment(order.customerId, order.total);
// Step 3: Wait for fraud check (async human review)
const approved = await condition(
() => fraudCheckComplete,
'24 hours',
);
if (!approved) {
// Compensate: refund payment and release inventory
await refundPayment(payment.id);
await updateInventory(order.items, 'release');
return { status: 'failed', reason: 'fraud_rejected' };
}
// Step 4: Send confirmation email
await sendEmail(order.customerId, 'order_confirmed', { orderId });
// Step 5: Notify warehouse for shipping
await notifyWarehouse(order);
return { status: 'completed', orderId };
} catch (error) {
// Compensate on any failure
await updateInventory(order.items, 'release');
throw error;
}
}
// activities.ts - Non-deterministic side effects
import { Context } from '@temporalio/activity';
export async function chargePayment(
customerId: string,
amount: number
): Promise<Payment> {
const response = await fetch('https://api.stripe.com/v1/charges', {
method: 'POST',
headers: { 'Authorization': `Bearer ${process.env.STRIPE_KEY}` },
body: JSON.stringify({ customer: customerId, amount: amount * 100 }),
});
if (!response.ok) {
throw new Error(`Payment failed: ${response.statusText}`);
}
return response.json();
}
export async function sendEmail(
userId: string,
template: string,
data: Record<string, any>
): Promise<void> {
await emailService.send({ userId, template, data });
}
export async function updateInventory(
items: OrderItem[],
action: 'reserve' | 'release'
): Promise<boolean> {
for (const item of items) {
await inventoryService.update(item.productId, action, item.quantity);
}
return true;
}Workflow Execution Model
Temporal workflows execute through a replay-based model. When a workflow starts, Temporal records every decision (activity call, timer, signal) in an event history. If the workflow worker crashes, a new worker picks up the workflow and replays the event history to reconstruct the workflow state. The workflow code runs again, but instead of executing activities, it reads the results from the history.
This means workflow code must be deterministic: the same input must produce the same decisions when replayed. Non-deterministic operations (random numbers, current time, UUIDs) must be performed through Temporal APIs that record the result in the history.
Architecture and Design Patterns
Worker Setup
Workers are processes that execute workflows and activities. They poll Temporal for tasks and execute the corresponding code:
// worker.ts
import { Worker } from '@temporalio/worker';
import * as activities from './activities';
async function run() {
const worker = await Worker.create({
workflowsPath: require.resolve('./workflows'),
activities,
taskQueue: 'order-processing',
maxConcurrentWorkflowTaskExecutions: 100,
maxConcurrentActivityTaskExecutions: 50,
});
await worker.run();
}
run().catch((err) => {
console.error('Worker failed:', err);
process.exit(1);
});Starting Workflows
Clients start workflows and interact with running workflows:
// client.ts
import { Connection, Client } from '@temporalio/client';
import { processOrder } from './workflows';
async function startOrderProcessing(orderId: string) {
const connection = await Connection.connect();
const client = new Client({ connection });
const handle = await client.workflow.start(processOrder, {
args: [orderId],
taskQueue: 'order-processing',
workflowId: `order-${orderId}`,
// Workflow runs for up to 30 days
workflowExecutionTimeout: '30 days',
});
console.log(`Started workflow: ${handle.workflowId}`);
// Wait for the result
const result = await handle.result();
console.log('Order result:', result);
return result;
}
// Query a running workflow
async function getOrderStatus(orderId: string) {
const handle = client.workflow.getHandle(`order-${orderId}`);
const status = await handle.query('getStatus');
return status;
}
// Signal a running workflow
async function approveOrder(orderId: string) {
const handle = client.workflow.getHandle(`order-${orderId}`);
await handle.signal('approveFraudCheck');
}Saga Pattern
The saga pattern handles distributed transactions by defining compensating actions for each step:
// workflows/saga.ts
export async function transferMoney(
fromAccount: string,
toAccount: string,
amount: number
): Promise<TransferResult> {
const saga = new Saga();
try {
// Step 1: Debit source account
const debitTx = await debitAccount(fromAccount, amount);
saga.addCompensation(() => creditAccount(fromAccount, amount, debitTx.id));
// Step 2: Credit destination account
const creditTx = await creditAccount(toAccount, amount);
saga.addCompensation(() => debitAccount(toAccount, amount, creditTx.id));
// Step 3: Record the transfer
const record = await recordTransfer(fromAccount, toAccount, amount);
saga.addCompensation(() => deleteTransfer(record.id));
return { status: 'completed', transferId: record.id };
} catch (error) {
// Execute all compensations in reverse order
await saga.compensate();
return { status: 'failed', reason: error.message };
}
}Step-by-Step Implementation
Setting Up a Temporal Project
# Install dependencies
npm init -y
npm install @temporalio/client @temporalio/worker @temporalio/workflow @temporalio/activity
npm install typescript @types/node
# Initialize TypeScript
npx tsc --initDefining a Complete Workflow
Build a user onboarding workflow that handles multi-step processes with retries:
// workflows/onboarding.ts
import {
proxyActivities,
sleep,
setHandler,
defineQuery,
defineSignal,
log
} from '@temporalio/workflow';
import type * as activities from '../activities/onboarding';
const {
createUserAccount,
sendVerificationEmail,
setupDefaultWorkspace,
assignDefaultRole,
sendWelcomeEmail,
notifyAdmin,
} = proxyActivities<typeof activities>({
startToCloseTimeout: '10 minutes',
retry: {
maximumAttempts: 3,
initialInterval: '1 second',
backoffCoefficient: 2,
},
});
interface OnboardingStatus {
step: string;
completed: string[];
failed?: string;
}
const statusQuery = defineQuery<OnboardingStatus>('getStatus');
const cancelSignal = defineSignal('cancel');
export async function onboardUser(
userId: string,
email: string,
plan: string
): Promise<OnboardingResult> {
const status: OnboardingStatus = {
step: 'starting',
completed: []
};
// Expose query handler for external status checks
setHandler(statusQuery, () => status);
// Expose signal handler for cancellation
let cancelled = false;
setHandler(cancelSignal, () => { cancelled = true; });
// Step 1: Create user account
status.step = 'creating_account';
const account = await createUserAccount(userId, email, plan);
status.completed.push('account');
if (cancelled) {
log.info('Onboarding cancelled after account creation');
return { status: 'cancelled' };
}
// Step 2: Send verification email
status.step = 'sending_verification';
await sendVerificationEmail(userId, email);
status.completed.push('verification');
// Step 3: Wait for email verification (up to 7 days)
status.step = 'waiting_verification';
const verified = await condition(
() => emailVerified,
'7 days',
);
if (!verified) {
status.step = 'timed_out';
log.warn('Email verification timed out', { userId });
return { status: 'timed_out' };
}
// Step 4: Set up workspace
status.step = 'setting_up_workspace';
const workspace = await setupDefaultWorkspace(userId, plan);
status.completed.push('workspace');
// Step 5: Assign role
status.step = 'assigning_role';
await assignDefaultRole(userId, plan);
status.completed.push('role');
// Step 6: Send welcome email
status.step = 'sending_welcome';
await sendWelcomeEmail(userId, workspace.id);
status.completed.push('welcome');
// Step 7: Notify admin
status.step = 'notifying_admin';
await notifyAdmin(userId, email, plan);
status.completed.push('admin_notification');
status.step = 'completed';
log.info('User onboarding completed', { userId });
return {
status: 'completed',
userId,
workspaceId: workspace.id
};
}Child Workflows
Break complex workflows into reusable child workflows:
// workflows/parent.ts
import { executeChild } from '@temporalio/workflow';
export async function processBatchOrders(orderIds: string[]) {
// Process orders in parallel using child workflows
const results = await Promise.allSettled(
orderIds.map((orderId) =>
executeChild(processOrder, {
args: [orderId],
taskQueue: 'order-processing',
workflowId: `order-${orderId}`,
})
)
);
const succeeded = results.filter((r) => r.status === 'fulfilled');
const failed = results.filter((r) => r.status === 'rejected');
if (failed.length > 0) {
// Handle partial failure
await notifyAdmin({
type: 'batch_partial_failure',
total: orderIds.length,
failed: failed.length,
errors: failed.map((f) => f.reason),
});
}
return {
total: orderIds.length,
succeeded: succeeded.length,
failed: failed.length,
};
}Real-World Use Cases and Case Studies
Use Case 1: Payment Processing
Payment processing workflows handle the complex lifecycle of a payment: authorization, capture, settlement, and refund. Temporal ensures that each step completes exactly once, even if the payment gateway is temporarily unavailable. The workflow retries failed steps with exponential backoff and escalates to manual review if retries are exhausted.
Use Case 2: Order Fulfillment
E-commerce order fulfillment involves multiple services: inventory, payment, shipping, and notifications. Temporal coordinates these services as a single workflow, handling partial failures with compensating actions. If the shipping service fails after payment succeeds, the workflow retries shipping before considering a refund.
Use Case 3: Data Pipeline Orchestration
ETL pipelines with multiple stages benefit from Temporal's durability. Each stage (extract, transform, load) is an activity with retry logic. If a stage fails, Temporal retries from that stage, not from the beginning. This is more efficient than restarting the entire pipeline.
Use Case 4: User Onboarding
Multi-step onboarding flows (account creation, email verification, workspace setup, role assignment) are natural workflows. Temporal handles the asynchronous nature of email verification (waiting hours or days for the user to click a link) while maintaining the overall flow state.
Best Practices for Production
-
Keep workflows deterministic: Never use
Date.now(),Math.random(), oruuid()in workflow code. Use Temporal'ssleep()for time-based logic and workflow-safe random APIs. -
Set appropriate timeouts: Configure
startToCloseTimeoutfor activities based on expected execution time. UsescheduleToCloseTimeoutto limit the total time including retries and scheduling delays. -
Use idempotent activities: Activities should be idempotent because Temporal may retry them. Use idempotency keys or database constraints to prevent duplicate side effects.
-
Version your workflows: When you change workflow logic, use Temporal's patching API to maintain backward compatibility with running workflows. This prevents replay failures.
-
Monitor workflow execution: Use Temporal's Web UI or Prometheus metrics to track workflow execution times, failure rates, and activity latencies. Set alerts for workflows that run longer than expected.
-
Use task queues for isolation: Separate different types of workflows onto different task queues. This prevents a surge of one workflow type from starving workers of another type.
-
Test workflows thoroughly: Use Temporal's test framework to run workflows in a test environment. Test failure scenarios by mocking activities to throw errors.
-
Limit workflow history size: Long-running workflows with many activities can accumulate large histories. Use
continueAsNewto start a new workflow execution with a fresh history.
Common Pitfalls and Solutions
| Pitfall | Impact | Solution |
|---|---|---|
| Non-deterministic code in workflows | Replay failures, data corruption | Use Temporal APIs for time/random, move side effects to activities |
| Missing activity timeouts | Workflows stuck forever | Always set startToCloseTimeout |
| Non-idempotent activities | Duplicate side effects on retry | Use idempotency keys or database constraints |
| Not versioning workflow changes | Replay failures for running workflows | Use Temporal's patching API |
| Overly large workflow histories | Performance degradation | Use continueAsNew for long-running workflows |
| Wrong task queue configuration | Workflows not picked up by workers | Ensure worker and workflow use the same task queue |
| Not handling activity failures | Unhandled exceptions crash workflows | Wrap activity calls in try/catch with compensation |
Performance Optimization
// Use activity batching for bulk operations
export async function processBulkOrders(orderIds: string[]) {
// Process in batches of 10 to avoid overwhelming downstream services
const batchSize = 10;
const results: OrderResult[] = [];
for (let i = 0; i < orderIds.length; i += batchSize) {
const batch = orderIds.slice(i, i + batchSize);
const batchResults = await Promise.all(
batch.map((id) => processOrder(id))
);
results.push(...batchResults);
}
return results;
}
// Use continueAsNew for long-running workflows
export async function monitoringWorkflow(serviceId: string) {
for (let i = 0; i < 1000; i++) {
const health = await checkHealth(serviceId);
if (health.status === 'unhealthy') {
await alertOnCall(serviceId, health);
}
await sleep('5 minutes');
}
// Continue as new to prevent history growth
await continueAsNew(serviceId);
}Comparison with Alternatives
| Feature | Temporal | AWS Step Functions | Apache Airflow | Cadence | AWS SWF |
|---|---|---|---|---|---|
| Language | Go, Java, TS, Python, PHP | JSON/YAML | Python | Go, Java | Java |
| Workflow Definition | Code | State Machine | DAG | Code | Code |
| Durability | Full | Full | Partial | Full | Full |
| Replay-Based | Yes | No | No | Yes | Yes |
| Testing | Unit tests | Limited | Limited | Unit tests | Limited |
| Self-Hosted | Yes | No (AWS only) | Yes | Yes | No (AWS only) |
| Scalability | High | High | Medium | High | High |
| Community | Large | Large | Large | Small | Legacy |
| Versioning | Built-in | Manual | Manual | Built-in | Manual |
Advanced Patterns and Techniques
Workflow Signals and Queries
// Define signals and queries
const updateSignal = defineSignal<[UpdateData]>('update');
const statusQuery = defineQuery<Status>('status');
const cancelSignal = defineSignal('cancel');
export async function longRunningWorkflow(input: Input) {
let status: Status = { phase: 'running', progress: 0 };
let cancelled = false;
setHandler(updateSignal, (data) => {
status.lastUpdate = data;
});
setHandler(statusQuery, () => status);
setHandler(cancelSignal, () => { cancelled = true; });
for (const task of input.tasks) {
if (cancelled) break;
status.currentTask = task.id;
await processTask(task);
status.progress++;
}
return { completed: !cancelled, processed: status.progress };
}
// Client-side interaction
const handle = await client.workflow.start(longRunningWorkflow, { ... });
// Query status
const status = await handle.query(statusQuery);
console.log(`Progress: ${status.progress}/${status.total}`);
// Send signal
await handle.signal(updateSignal, { field: 'value' });
// Cancel
await handle.signal(cancelSignal);Activity Heartbeating for Long Activities
// activities/longRunning.ts
import { Context } from '@temporalio/activity';
export async function processLargeDataset(datasetId: string) {
const dataset = await loadDataset(datasetId);
const total = dataset.records.length;
for (let i = 0; i < total; i++) {
await processRecord(dataset.records[i]);
// Send heartbeat every 100 records
if (i % 100 === 0) {
Context.current().heartbeat({
progress: i,
total,
percentage: Math.round((i / total) * 100)
});
}
}
return { processed: total };
}Testing Strategies
// tests/workflows.test.ts
import { TestWorkflowEnvironment } from '@temporalio/testing';
import { processOrder } from '../workflows';
describe('processOrder', () => {
let testEnv: TestWorkflowEnvironment;
beforeAll(async () => {
testEnv = await TestWorkflowEnvironment.createLocal();
});
afterAll(async () => {
await testEnv.teardown();
});
it('processes a valid order successfully', async () => {
const { client, nativeConnection } = testEnv;
const worker = await Worker.create({
connection: nativeConnection,
taskQueue: 'test',
activities: {
chargePayment: async () => ({ id: 'pay_123' }),
sendEmail: async () => {},
updateInventory: async () => true,
notifyWarehouse: async () => {},
},
});
await worker.runUntil(async () => {
const result = await client.workflow.execute(processOrder, {
taskQueue: 'test',
workflowId: 'test-order-1',
args: ['order-123'],
});
expect(result.status).toBe('completed');
});
});
it('handles payment failure with compensation', async () => {
const { client, nativeConnection } = testEnv;
const worker = await Worker.create({
connection: nativeConnection,
taskQueue: 'test',
activities: {
chargePayment: async () => { throw new Error('Card declined'); },
sendEmail: async () => {},
updateInventory: async () => true,
notifyWarehouse: async () => {},
},
});
await worker.runUntil(async () => {
const result = await client.workflow.execute(processOrder, {
taskQueue: 'test',
workflowId: 'test-order-2',
args: ['order-456'],
});
expect(result.status).toBe('failed');
expect(result.reason).toBe('payment_failed');
});
});
});Future Outlook
Temporal continues to grow as the leading durable workflow platform. The company recently raised $100M+ in funding and is expanding its cloud offering with improved monitoring, debugging, and deployment tools.
The TypeScript SDK is maturing rapidly, with improved type inference, better developer ergonomics, and integration with popular frameworks like NestJS and Express. The community is building reusable workflow patterns and shared activity libraries.
The broader trend toward microservices and distributed systems creates a growing need for workflow orchestration. As applications become more distributed, the complexity of coordinating services increases, and Temporal's durable execution model becomes increasingly valuable.
Conclusion
Temporal.io transforms how you build distributed systems:
-
Workflow as code eliminates fragile orchestration: Instead of managing state machines, message queues, and retry logic, you write regular functions. Temporal handles durability, retries, and failure recovery automatically.
-
The replay-based execution model provides strong guarantees: Workflows execute exactly once, even across server crashes and network partitions. This eliminates the need for idempotency at the workflow level.
-
The activity model cleanly separates concerns: Deterministic workflow logic stays in workflows, while non-deterministic side effects live in activities. This separation makes code testable and maintainable.
-
Built-in patterns for complex scenarios: Sagas, child workflows, signals, queries, and versioning are first-class features. You do not need to build these patterns yourself.
-
The developer experience is excellent: Write workflows in your preferred language, test them with standard test frameworks, and debug them with the Temporal Web UI. The learning curve is low because the programming model is familiar.
-
It scales from simple to complex: A basic workflow with two activities works the same as a complex workflow with hundreds of activities, child workflows, and human-in-the-loop steps. The API does not change—only the workflow code grows.
If you are building distributed systems with long-running processes, Temporal is the most robust solution available. The durability guarantees it provides eliminate an entire class of failure modes, and the programming model makes complex workflows as simple to write and maintain as regular functions.