MinhVo

Minh Vo

rss feed

Slaying code & making it lit fr fr 🔥 tagline

Hey there 👋 I'm an AI Engineer with 7 years of experience building scalable web and mobile applications. Currently at Neurond AI (May 2025 — present), architecting an Enterprise AI Assistant Platform with multi-tenant RAG on pgvector, multi-provider LLM orchestration, and Azure-native infrastructure. Previously spent 5+ years at SNAPTEC (Sep 2019 — Apr 2025), leading SaaS themes, admin dashboards, and e-commerce platforms — earned the Hero of the Year award in 2021. I specialize in TypeScript, React, Next.js, and AI-Native engineering with Claude Code and Cursor.bio

Back to blogs

GraphQL Federation: Building a Distributed Graph

Implement Apollo Federation: subgraphs, supergraph, entity resolution, and schema composition.

GraphQLFederationApolloMicroservices

By MinhVo

Introduction

As organizations grow, so does the complexity of their data graph. A single monolithic GraphQL server becomes a bottleneck when dozens of teams need to contribute to the schema. GraphQL Federation, introduced by Apollo, solves this by allowing multiple teams to own and deploy their own subgraphs while presenting a unified supergraph to clients.

Federation enables each team to work independently on their domain—users, products, orders, reviews—without coordinating schema changes across the entire organization. The gateway composes these subgraphs into a single schema, resolves cross-service references, and routes queries to the appropriate services.

This architecture mirrors the organizational principle of Conway's Law: the structure of your software system tends to reflect the structure of your organization. Federation gives each team full ownership of their portion of the graph while maintaining a cohesive API surface for consumers.

Federation architecture overview

Understanding Federation: Core Concepts

Subgraphs and the Supergraph

A subgraph is an independently deployable GraphQL service that owns a portion of the overall schema. Each subgraph defines its own types, queries, mutations, and entity references. The supergraph is the composed result of all subgraphs—it's what clients see and query against.

Consider an e-commerce platform. The Users subgraph owns User and Account types. The Products subgraph owns Product, Category, and Inventory. The Orders subgraph owns Order, OrderItem, and Payment. Each team deploys their subgraph independently, and the gateway merges them into a single coherent schema.

The composition process validates that subgraphs are compatible—no conflicting type definitions, no missing entity references, and no ambiguous field ownership. When composition fails, the gateway rejects the new schema and keeps the previous working version.

Entity Resolution

Entities are the cornerstone of federation. An entity is a type that can be uniquely identified by a key (like User with id) and can be referenced across subgraphs. When the Orders subgraph needs to include user information in an order, it references the User entity from the Users subgraph using the @key directive.

The gateway resolves entity references by making parallel requests to the owning subgraphs. When a query spans multiple subgraphs, the gateway creates an execution plan that fetches data from each service and stitches the results together.

Entity resolution follows a two-phase process:

  1. Fetch the root entity from the subgraph that owns the query field
  2. Resolve referenced entities by calling __resolveReference on the owning subgraphs

This means a single client query might result in multiple downstream service calls, which is why performance optimization (batching, caching) is critical.

Directives in Federation

Federation introduces several directives that define how types and fields relate across subgraphs:

# In Users subgraph
type User @key(fields: "id") {
  id: ID!
  name: String!
  email: String!
}
 
# In Orders subgraph
extend type User @key(fields: "id") {
  id: ID! @external
  orders: [Order!]!
}
 
type Order {
  id: ID!
  user: User!
  items: [OrderItem!]!
  total: Float!
}

The @key directive defines how to look up an entity. The @external directive marks fields that exist in another subgraph but are needed for resolution. The @requires directive specifies additional fields needed from the external service to compute a derived field.

Additional federation directives include:

  • @provides: Tells the gateway that a subgraph can supply specific fields of an entity it doesn't own, reducing downstream calls
  • @shareable (Federation v2): Allows multiple subgraphs to resolve the same field
  • @override (Federation v2): Transfers field ownership between subgraphs during migration
  • @inaccessible (Federation v2): Hides a field from the composed supergraph schema

Architecture and Design Patterns

The Gateway Architecture

The gateway sits between clients and subgraphs. It receives client queries, creates an execution plan, routes requests to the appropriate subgraphs, and assembles the results. This architecture allows clients to query a single endpoint while data is distributed across multiple services.

import { ApolloServer } from "@apollo/server";
import { ApolloGateway, IntrospectAndCompose } from "@apollo/gateway";
 
const gateway = new ApolloGateway({
  supergraphSdl: new IntrospectAndCompose({
    subgraphs: [
      { name: "users", url: "http://localhost:4001/graphql" },
      { name: "products", url: "http://localhost:4002/graphql" },
      { name: "orders", url: "http://localhost:4003/graphql" },
    ],
  }),
});
 
const server = new ApolloServer({
  gateway,
});

The gateway performs several critical functions:

  • Query planning: Analyzes the incoming query and determines which subgraphs to contact
  • Execution orchestration: Sends parallel or sequential requests to subgraphs based on dependencies
  • Response merging: Combines partial responses from multiple subgraphs into a single coherent result
  • Error propagation: Aggregates and normalizes errors from different subgraphs

Entity Reference Pattern

When a subgraph needs to reference an entity from another subgraph, it implements the __resolveReference function. This function receives the entity's key and returns the full entity data:

const resolvers = {
  User: {
    __resolveReference(reference: { id: string }, context: { dataSources: any }) {
      return context.dataSources.usersAPI.getUserById(reference.id);
    },
    orders(user: { id: string }, _, { dataSources }) {
      return dataSources.ordersAPI.getOrdersByUserId(user.id);
    },
  },
};

The __resolveReference function is called by the gateway when another subgraph references this entity. It receives the key fields specified in the @key directive and must return enough data to satisfy the fields being requested.

Schema Composition Strategy

Designing subgraph boundaries requires careful consideration of data ownership, team structure, and query patterns. A common strategy is to align subgraphs with bounded contexts from domain-driven design:

SubgraphOwned EntitiesKey Relationships
UsersUser, Account, Profile→ Orders (via User.orders)
ProductsProduct, Category, Inventory→ Reviews (via Product.reviews)
OrdersOrder, OrderItem, Payment→ User, Product (entity refs)
ReviewsReview, Rating→ User, Product (entity refs)

Each entity should have exactly one owning subgraph—the subgraph that defines its @key and primary fields. Other subgraphs can extend the entity with additional fields but cannot redefine its core properties.

Subgraph boundary diagram

Step-by-Step Implementation

Setting Up a Subgraph

Each subgraph is a standard Apollo Server instance with federation-specific extensions. Install the federation packages and define your subgraph schema:

npm install @apollo/subgraph @apollo/server graphql
import { ApolloServer } from "@apollo/server";
import { buildSubgraphSchema } from "@apollo/subgraph";
import { gql } from "graphql-tag";
 
const typeDefs = gql`
  extend schema
    @link(url: "https://specs.apollo.dev/federation/v2.0",
          import: ["@key", "@external", "@requires"])
 
  type User @key(fields: "id") {
    id: ID!
    name: String!
    email: String!
    avatar: String
    createdAt: DateTime!
  }
 
  type Query {
    user(id: ID!): User
    users(limit: Int, offset: Int): [User!]!
  }
 
  type Mutation {
    createUser(input: CreateUserInput!): User!
    updateUser(id: ID!, input: UpdateUserInput!): User!
  }
`;
 
const resolvers = {
  Query: {
    user: (_, { id }, { dataSources }) => dataSources.usersAPI.getUser(id),
    users: (_, args, { dataSources }) => dataSources.usersAPI.getUsers(args),
  },
  Mutation: {
    createUser: (_, { input }, { dataSources }) =>
      dataSources.usersAPI.createUser(input),
    updateUser: (_, { id, input }, { dataSources }) =>
      dataSources.usersAPI.updateUser(id, input),
  },
  User: {
    __resolveReference(user, { dataSources }) {
      return dataSources.usersAPI.getUser(user.id);
    },
  },
};
 
const server = new ApolloServer({
  schema: buildSubgraphSchema({ typeDefs, resolvers }),
});

Extending Types Across Subgraphs

When the Orders subgraph needs to add an orders field to the User type, it uses extend type:

// Orders subgraph schema
const typeDefs = gql`
  extend schema
    @link(url: "https://specs.apollo.dev/federation/v2.0",
          import: ["@key", "@external"])
 
  type Order @key(fields: "id") {
    id: ID!
    user: User!
    items: [OrderItem!]!
    total: Float!
    status: OrderStatus!
    createdAt: DateTime!
  }
 
  type OrderItem {
    product: Product!
    quantity: Int!
    price: Float!
  }
 
  extend type User @key(fields: "id") {
    id: ID! @external
    orders(limit: Int, status: OrderStatus): [Order!]!
  }
 
  extend type Product @key(fields: "id") {
    id: ID! @external
    orderCount: Int!
  }
 
  type Query {
    order(id: ID!): Order
    orders(userId: ID, limit: Int): [Order!]!
  }
`;

Setting Up the Gateway

The gateway composes all subgraphs into a single schema. In development, use IntrospectAndCompose for automatic schema discovery. In production, use a managed supergraph from Apollo Studio:

import { ApolloServer } from "@apollo/server";
import { startStandaloneServer } from "@apollo/server/standalone";
import {
  ApolloGateway,
  IntrospectAndCompose,
  RemoteGraphQLDataSource,
} from "@apollo/gateway";
 
const gateway = new ApolloGateway({
  supergraphSdl: new IntrospectAndCompose({
    subgraphs: [
      { name: "users", url: process.env.USERS_SERVICE_URL },
      { name: "products", url: process.env.PRODUCTS_SERVICE_URL },
      { name: "orders", url: process.env.ORDERS_SERVICE_URL },
      { name: "reviews", url: process.env.REVIEWS_SERVICE_URL },
    ],
  }),
  buildService({ url }) {
    return new RemoteGraphQLDataSource({
      url,
      willSendRequest({ request, context }) {
        request.http.headers.set("authorization", context.authToken);
      },
    });
  },
});
 
const server = new ApolloServer({ gateway });
 
const { url } = await startStandaloneServer(server, {
  listen: { port: 4000 },
  context: async ({ req }) => ({
    authToken: req.headers.authorization,
  }),
});

Implementing Entity Batching

When the gateway needs to resolve multiple entities of the same type, it batches them into a single request using the _entities query. This is critical for avoiding N+1 query problems:

import DataLoader from "dataloader";
 
const resolvers = {
  User: {
    __resolveReference(user, { dataSources }) {
      return dataSources.usersLoader.load(user.id);
    },
  },
};
 
// In your data source
class UsersAPI {
  constructor() {
    this.loader = new DataLoader(async (ids) => {
      const users = await this.db.users.findByIds(ids);
      return ids.map((id) => users.find((u) => u.id === id) || null);
    });
  }
 
  getUser(id) {
    return this.loader.load(id);
  }
}

Entity resolution flow

Federation v2: What Changed

Federation v2 introduced significant improvements over v1:

Simplified Composition

In v1, extend type was required whenever a subgraph referenced a type from another subgraph. In v2, types can be referenced directly without extend, making schemas cleaner and more intuitive:

# Federation v2 - no extend needed
type Order @key(fields: "id") {
  id: ID!
  userId: ID!
  user: User!  # References Users subgraph directly
}

Shareable Fields

The @shareable directive allows multiple subgraphs to resolve the same field. This is useful for commonly accessed fields like name or email that might be cached differently by different services:

# Users subgraph
type User @key(fields: "id") {
  id: ID!
  name: String! @shareable
  email: String! @shareable
}
 
# Profiles subgraph
type User @key(fields: "id") {
  id: ID! @external
  name: String! @shareable
  profileUrl: String!
}

Progressive Migration with @override

The @override directive enables zero-downtime field migration between subgraphs:

# Old subgraph (being migrated from)
type User @key(fields: "id") {
  id: ID!
  email: String! @override(from: "users-v2")
}
 
# New subgraph (being migrated to)
type User @key(fields: "id") {
  id: ID! @external
  email: String! @shareable
}

Real-World Use Cases

E-Commerce Platform

A large e-commerce company uses federation to divide their graph across 15 teams. The Catalog team owns product data, the Cart team handles shopping carts, and the Payments team manages transactions. Each team deploys independently multiple times per day, and the gateway automatically picks up schema changes through managed federation.

The key benefit is that the frontend team can query { user { name orders { items { product { name price } } } } } without knowing which services own which data. The gateway handles all routing and resolution.

Financial Services

A bank uses federation to separate customer data, account management, and transaction processing into distinct subgraphs. Strict access controls at the subgraph level ensure that only authorized requests can access sensitive financial data, while the gateway handles authentication and routing.

Each subgraph has its own rate limiting and caching policies. The transactions subgraph might have a short cache TTL (seconds) while the customer profile subgraph caches for minutes.

Media Streaming Platform

A streaming service uses federation to combine content metadata, user profiles, viewing history, and recommendation engines into a single graph. The recommendation subgraph can query user viewing history through entity resolution without direct database access, maintaining clean service boundaries.

Best Practices for Production

  1. Align subgraphs with team boundaries: Each team should own their subgraph completely. Avoid splitting a single domain across multiple teams.

  2. Use Federation v2: Federation v2 simplifies schema composition and removes the need for extend type in many cases. The composition rules are more flexible and error messages are clearer.

  3. Implement entity batching: Always use DataLoader for resolving multiple entities to avoid N+1 query problems. This is the single most impactful performance optimization.

  4. Use a schema registry: Store and version your supergraph schema in a registry (Apollo GraphOS) for change management, rollback capabilities, and schema analytics.

  5. Monitor gateway performance: Track query planning time, subgraph response times, and error rates. The gateway is a critical path component—if it goes down, the entire API is unavailable.

  6. Implement circuit breakers: When a subgraph is unavailable, use circuit breakers to prevent cascading failures. Return partial data with errors rather than failing the entire query.

  7. Cache entity lookups: Implement DataLoader or similar batching/caching at the subgraph level. Consider using Redis for cross-request entity caching.

  8. Test schema composition: Run composition checks in CI to catch schema conflicts before deployment. Use rover subgraph check to validate changes against the production schema.

  9. Version your subgraph schemas: Use semantic versioning for subgraph schemas and maintain backward compatibility. Deprecate fields before removing them.

  10. Document entity ownership: Maintain clear documentation of which subgraph owns which entity and what keys are used for resolution.

Common Pitfalls and Solutions

PitfallImpactSolution
Circular entity referencesInfinite resolution loopsDesign clear entity ownership boundaries
N+1 queries in entity resolutionSevere performance degradationUse DataLoader for batching
Schema composition conflictsDeployment failuresRun composition checks in CI
Missing @key directivesEntity not resolvable across subgraphsAlways define keys for shared entities
Over-fetching across subgraphsLatency increasesUse @requires to minimize data transfer
Tight coupling between subgraphsReduced team autonomyMinimize cross-subgraph entity references
Gateway as single point of failureTotal API outageDeploy multiple gateway instances behind a load balancer
Stale schema in gatewayRuntime errorsUse managed federation with automatic schema delivery

Performance Optimization

Federation adds latency through the gateway's query planning and multi-service routing. Optimize by minimizing the number of subgraph hops, using @provides to pre-fetch commonly needed fields, and implementing response caching at the gateway level:

const gateway = new ApolloGateway({
  supergraphSdl,
  queryPlannerConfig: {
    cache: new InMemoryLRUCache({ maxSize: 1000 }),
  },
});

Additional performance strategies:

  • Persisted queries: Send query hashes instead of full query strings to reduce parsing overhead
  • Automatic persisted queries (APQ): Apollo's mechanism for caching query documents
  • Response caching: Use CDN-level caching for queries with predictable results
  • Query complexity analysis: Reject overly expensive queries before they reach subgraphs
  • Subgraph-level caching: Each subgraph can implement its own caching strategy independent of the gateway

Debugging and Observability

Federation introduces complexity that requires dedicated observability tooling. When a query spans multiple subgraphs, tracing the execution path is critical for diagnosing latency issues and errors.

Apollo Studio provides distributed tracing that shows the gateway's query plan, individual subgraph response times, and entity resolution paths. Each trace shows which subgraphs were contacted, how many entities were resolved, and where bottlenecks exist. For self-hosted setups, OpenTelemetry integration captures the same tracing data:

import { ApolloServerPluginInlineTrace } from "@apollo/server/plugin/inlineTrace";
 
const server = new ApolloServer({
  gateway,
  plugins: [ApolloServerPluginInlineTrace()],
});

Common debugging scenarios include slow entity resolution (usually caused by missing DataLoader batching), schema composition failures (typically from conflicting type definitions), and authorization errors (often from missing auth header propagation through the gateway's willSendRequest hook). Set up alerts on gateway latency percentiles (p50, p95, p99) and subgraph error rates to catch issues before they impact clients.

Structured logging at the gateway level should include the query hash, operation name, execution plan complexity score, total subgraph calls, and per-subgraph latency. This data feeds into dashboards that reveal patterns like consistently slow queries that need optimization or subgraphs that need scaling.

Comparison with Alternatives

FeatureApollo FederationSchema StitchingMonolithic GraphQL
Team autonomyHighModerateLow
Deployment independenceYesPartialNo
Entity resolutionBuilt-inManualN/A
Performance overheadModerateLowNone
ComplexityHighModerateLow
Schema ownershipDistributedCentralizedCentralized
Type safety across servicesStrongWeakStrong
Tooling ecosystemMature (GraphOS)LimitedMature

Schema stitching requires manual resolution logic and doesn't have the same level of tooling support. Monolithic GraphQL is simpler but doesn't scale organizationally. Federation strikes a balance between organizational scalability and technical complexity.

Advanced Patterns

Custom Directives in Federation

Extend federation with custom directives for cross-cutting concerns:

directive @cacheControl(maxAge: Int!, scope: CacheScope) on FIELD_DEFINITION
 
type User @key(fields: "id") {
  id: ID!
  name: String! @cacheControl(maxAge: 3600)
  email: String! @cacheControl(maxAge: 300)
}

Progressive Schema Migration

Use federation to gradually migrate from a monolithic schema to distributed subgraphs. Start by extracting one entity at a time and running both the monolith and the new subgraph simultaneously:

const gateway = new ApolloGateway({
  supergraphSdl: new IntrospectAndCompose({
    subgraphs: [
      { name: "legacy", url: "http://localhost:4000/graphql" },
      { name: "users", url: "http://localhost:4001/graphql" },
    ],
  }),
});

This pattern allows you to migrate incrementally. Start with a read-only subgraph that mirrors the monolith's User type, then gradually move write operations. Once the migration is complete, remove the User type from the legacy subgraph.

Federated Subscriptions

Federation v2 supports subscriptions through the @subscription directive. The gateway can route subscription requests to the appropriate subgraph and stream results to clients:

type Subscription {
  orderStatusChanged(orderId: ID!): Order! @subscription(fields: "orderId")
}

Testing Strategies

Test federation by composing test subgraphs and verifying entity resolution:

import { buildSubgraphSchema } from "@apollo/subgraph";
import { ApolloServer } from "@apollo/server";
 
describe("Federation entity resolution", () => {
  it("resolves User entity across subgraphs", async () => {
    const server = new ApolloServer({
      schema: buildSubgraphSchema({ typeDefs, resolvers }),
    });
 
    const response = await server.executeOperation({
      query: `{ order(id: "1") { id user { name email } } }`,
    });
 
    expect(response.body.singleResult.data.order.user.name).toBeDefined();
  });
 
  it("handles missing entities gracefully", async () => {
    const response = await server.executeOperation({
      query: `{ order(id: "999") { id user { name } } }`,
    });
 
    expect(response.body.singleResult.errors).toBeDefined();
  });
});

Use @apollo/federation-integration-testsuite for comprehensive composition and resolution testing.

Future Outlook

GraphQL Federation continues to evolve with Federation v2 bringing simpler composition rules and better error handling. The Apollo GraphOS platform provides managed federation with schema proposals, checks, and analytics. Emerging patterns include edge-deployed gateways for reduced latency, AI-assisted schema design for optimizing query patterns, and tighter integration with event-driven architectures for real-time data propagation.

The GraphQL working group is also exploring standard federation specifications that would allow interoperability between different federation implementations (Apollo, Cosmo, Hive), reducing vendor lock-in.

Conclusion

GraphQL Federation enables organizations to scale their GraphQL APIs across multiple teams and services while maintaining a unified client experience. The key takeaways are:

  1. Align subgraph boundaries with team ownership and domain boundaries—this is the most important architectural decision
  2. Use @key directives to define entity resolution across subgraphs
  3. Implement entity batching with DataLoader to avoid N+1 queries—the most common performance pitfall
  4. Use a schema registry for composition checks and change management
  5. Monitor gateway performance and implement circuit breakers for resilience
  6. Migrate to Federation v2 for simpler composition and more flexible schema design

Start with a well-defined domain model, identify natural entity boundaries, and extract subgraphs incrementally. Federation's power lies in enabling autonomous teams to evolve their schemas independently while delivering a cohesive API to clients.