MinhVo

Minh Vo

rss feed

Slaying code & making it lit fr fr 🔥 tagline

Hey there 👋 I'm an AI Engineer with 7 years of experience building scalable web and mobile applications. Currently at Neurond AI (May 2025 — present), architecting an Enterprise AI Assistant Platform with multi-tenant RAG on pgvector, multi-provider LLM orchestration, and Azure-native infrastructure. Previously spent 5+ years at SNAPTEC (Sep 2019 — Apr 2025), leading SaaS themes, admin dashboards, and e-commerce platforms — earned the Hero of the Year award in 2021. I specialize in TypeScript, React, Next.js, and AI-Native engineering with Claude Code and Cursor.bio

Back to blogs

AI SDK by Vercel: Building AI-Powered React Applications

Build AI features with Vercel AI SDK: streaming, tool calling, structured output, and chat UI.

AI SDKVercelReactAILLM

By MinhVo

Introduction

Building AI-powered user interfaces is harder than it should be. You need to handle streaming responses, manage conversation state, implement tool calling, parse structured output, manage loading states, and gracefully handle errors — all while providing a smooth user experience. The Vercel AI SDK solves these problems with a unified, framework-agnostic toolkit that makes building AI features as straightforward as building any other React component. Since its release, it has become the standard way to integrate LLMs into Next.js and React applications, with over 1.5 million weekly npm downloads and adoption by companies like Notion, Linear, and Vercel itself.

Vercel AI SDK for React applications

The AI SDK's key innovation is abstracting the differences between LLM providers behind a unified API. Whether you're using OpenAI, Anthropic, Google, or open-source models through Ollama, the same code works. Switching providers requires changing one line of code, not rewriting your entire integration. This provider-agnostic approach eliminates vendor lock-in and makes it easy to use the best model for each task — GPT-4o for reasoning, Claude for long context, Gemini for multimodal inputs, or Llama for local inference.

The SDK is built on three pillars: AI SDK Core (server-side functions for generating text, structured data, and tool calls), AI SDK UI (React hooks and components for building chat interfaces), and AI SDK RAG (utilities for retrieval-augmented generation). Together, these components cover the full spectrum of AI application development, from simple text generation to complex agentic workflows with streaming UI.

Understanding the AI SDK: Core Concepts

Provider Abstraction

The AI SDK uses a provider system that normalizes the differences between LLM APIs. Each provider (OpenAI, Anthropic, Google, Mistral, Cohere, Amazon Bedrock, etc.) implements a standard interface, so your application code doesn't change when you switch models or providers. Providers are installed as separate packages, keeping your bundle size minimal.

import { openai } from '@ai-sdk/openai';
import { anthropic } from '@ai-sdk/anthropic';
import { google } from '@ai-sdk/google';
import { mistral } from '@ai-sdk/mistral';
import { ollama } from 'ollama-ai-provider';
 
// Same code works with any provider
const model = openai('gpt-4o');                        // OpenAI
const model = anthropic('claude-3-5-sonnet-20241022');  // Anthropic
const model = google('gemini-2.0-flash');              // Google
const model = mistral('mistral-large-latest');          // Mistral
const model = ollama('llama3.1');                       // Local via Ollama

You can also use custom providers or connect to any OpenAI-compatible API (like Azure OpenAI, Together AI, or Fireworks) by configuring the base URL and API key in the provider constructor.

Streaming

The AI SDK is built around streaming by default. Text generation streams tokens as they're produced using Server-Sent Events (SSE), enabling real-time UI updates. This makes AI interfaces feel responsive even for long responses — users see output immediately rather than waiting for the complete response. The streaming protocol handles backpressure, chunking, and error recovery automatically, so you never need to manage low-level streaming details.

Server Actions Integration

The SDK integrates seamlessly with Next.js Server Actions and Route Handlers. You define AI functions on the server and call them from client components using the SDK's React hooks. This keeps API keys secure on the server while providing a smooth client-side experience. The SDK also supports Edge Runtime for lower latency, and can run on any Node.js server outside of Next.js.

Type Safety

The SDK is written in TypeScript with full type inference. Tool definitions, structured output schemas, and message types are all strongly typed, catching errors at compile time rather than runtime. Zod schemas used for structured output are automatically converted to JSON Schema for the LLM and TypeScript types for your application code.

AI SDK architecture

Architecture and Design Patterns

The Server Action Pattern

Define AI functions as Next.js Server Actions that return streaming responses. The client calls these actions using the SDK's hooks, which handle streaming, state management, and error handling automatically. This pattern works well with Next.js App Router and React Server Components, keeping AI logic on the server.

The Route Handler Pattern

Create dedicated API routes for AI operations. This provides more control over request/response handling and enables middleware for authentication, rate limiting, and logging. Route handlers are ideal when you need to support multiple client types (web, mobile, CLI) or when you want to expose AI functionality as a REST API.

The Hook Pattern

Use the SDK's React hooks (useChat, useCompletion, useObject) to manage AI state in client components. These hooks handle loading states, streaming updates, error handling, and message history automatically. Each hook returns a consistent interface with data, error, isLoading, and mutation functions, making them predictable and composable.

The Schema-First Pattern

Define your data schemas using Zod, and use them for both type safety in your application and structured output from the LLM. This ensures your model output always matches your application's type system. Schemas also serve as documentation for what the LLM should produce, improving output quality.

Step-by-Step Implementation

Setting Up the AI SDK

npm install ai @ai-sdk/openai
// app/api/chat/route.ts
import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';
 
export const maxDuration = 30;
 
export async function POST(req: Request) {
  const { messages } = await req.json();
 
  const result = streamText({
    model: openai('gpt-4o'),
    system: 'You are a helpful assistant. Be concise and accurate.',
    messages,
  });
 
  return result.toDataStreamResponse();
}

Building a Chat Interface

The useChat hook is the primary way to build conversational UIs. It manages the full message lifecycle — appending user messages, streaming assistant responses, handling tool calls, and maintaining conversation history. The hook communicates with your API endpoint via a streaming protocol that supports text chunks, tool invocations, and structured data.

// app/chat/page.tsx
'use client';
 
import { useChat } from 'ai/react';
 
export default function ChatPage() {
  const { messages, input, handleInputChange, handleSubmit, isLoading, error, stop, reload } = useChat({
    api: '/api/chat',
    onError: (err) => console.error('Chat error:', err),
    onFinish: (message) => console.log('Finished:', message),
  });
 
  return (
    <div className="flex flex-col h-screen max-w-2xl mx-auto p-4">
      <div className="flex-1 overflow-y-auto space-y-4 mb-4">
        {messages.map((message) => (
          <div
            key={message.id}
            className={`p-3 rounded-lg ${
              message.role === 'user' ? 'bg-blue-100 ml-auto' : 'bg-gray-100'
            } max-w-[80%]`}
          >
            <p className="text-sm font-semibold mb-1">
              {message.role === 'user' ? 'You' : 'AI'}
            </p>
            <div className="whitespace-pre-wrap">{message.content}</div>
          </div>
        ))}
        {isLoading && (
          <div className="bg-gray-100 p-3 rounded-lg animate-pulse">
            Thinking...
          </div>
        )}
        {error && (
          <div className="bg-red-100 p-3 rounded-lg text-red-700">
            Error: {error.message}
          </div>
        )}
      </div>
 
      <form onSubmit={handleSubmit} className="flex gap-2">
        <input
          value={input}
          onChange={handleInputChange}
          placeholder="Type a message..."
          className="flex-1 p-2 border rounded-lg"
          disabled={isLoading}
        />
        <button
          type="submit"
          disabled={isLoading}
          className="px-4 py-2 bg-blue-500 text-white rounded-lg disabled:opacity-50"
        >
          Send
        </button>
      </form>
    </div>
  );
}

Implementing Tool Calling

Tool calling lets the LLM invoke client-defined functions to fetch data, perform calculations, or trigger actions. The SDK handles the full round-trip: the LLM requests a tool call, the SDK executes your function, and feeds the result back to the LLM for continued reasoning. With maxSteps, you can enable multi-step tool use where the LLM chains multiple tool calls together.

// app/api/chat-with-tools/route.ts
import { openai } from '@ai-sdk/openai';
import { streamText, tool } from 'ai';
import { z } from 'zod';
 
export async function POST(req: Request) {
  const { messages } = await req.json();
 
  const result = streamText({
    model: openai('gpt-4o'),
    system: 'You are a helpful assistant with access to tools.',
    messages,
    tools: {
      getWeather: tool({
        description: 'Get current weather for a location',
        parameters: z.object({
          location: z.string().describe('City name'),
          units: z.enum(['celsius', 'fahrenheit']).default('celsius'),
        }),
        execute: async ({ location, units }) => {
          const response = await fetch(
            `https://api.weather.com/v1/current?q=${location}&units=${units}`
          );
          return response.json();
        },
      }),
      searchProducts: tool({
        description: 'Search the product catalog',
        parameters: z.object({
          query: z.string(),
          category: z.string().optional(),
          maxPrice: z.number().optional(),
        }),
        execute: async ({ query, category, maxPrice }) => {
          // In production, query your database
          return {
            products: [
              { name: `Result for "${query}"`, price: 29.99, inStock: true },
            ],
            total: 1,
          };
        },
      }),
    },
    maxSteps: 5, // Allow multi-step tool use
  });
 
  return result.toDataStreamResponse();
}

Structured Output with AI SDK

The generateObject function forces the LLM to produce output that exactly matches a Zod schema. This is invaluable for data extraction, classification, form filling, and any scenario where you need reliable, typed output rather than freeform text. The SDK uses constrained decoding (where supported) or prompt engineering with validation to guarantee schema compliance.

// app/api/analyze/route.ts
import { openai } from '@ai-sdk/openai';
import { generateObject } from 'ai';
import { z } from 'zod';
 
const sentimentSchema = z.object({
  sentiment: z.enum(['positive', 'negative', 'neutral', 'mixed']),
  confidence: z.number().min(0).max(1),
  keyPoints: z.array(z.object({
    text: z.string(),
    sentiment: z.enum(['positive', 'negative']),
  })),
  summary: z.string(),
});
 
export async function POST(req: Request) {
  const { text } = await req.json();
 
  const { object } = await generateObject({
    model: openai('gpt-4o'),
    schema: sentimentSchema,
    prompt: `Analyze the sentiment of this text:\n\n${text}`,
  });
 
  return Response.json(object);
}

Multi-Modal Chat with Image Understanding

The SDK supports multimodal inputs including images, audio, and video (depending on the provider). You can pass image URLs, base64-encoded images, or file buffers as part of the message content array. This enables building applications that can analyze screenshots, read documents, describe photos, or extract data from charts.

// app/api/vision/route.ts
import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';
 
export async function POST(req: Request) {
  const { messages, imageUrl } = await req.json();
 
  const result = streamText({
    model: openai('gpt-4o'),
    messages: [
      ...messages,
      {
        role: 'user',
        content: [
          { type: 'text', text: 'Analyze this image in detail.' },
          { type: 'image', image: imageUrl },
        ],
      },
    ],
  });
 
  return result.toDataStreamResponse();
}

The SDK provides a unified embed and embedMany function for generating vector embeddings. These are essential for building semantic search, RAG pipelines, recommendation engines, and clustering applications.

import { openai } from '@ai-sdk/openai';
import { embed, embedMany } from 'ai';
 
// Single embedding
const { embedding } = await embed({
  model: openai.embedding('text-embedding-3-small'),
  value: 'What is the Vercel AI SDK?',
});
 
// Batch embeddings (more efficient)
const { embeddings } = await embedMany({
  model: openai.embedding('text-embedding-3-small'),
  values: ['First document', 'Second document', 'Third document'],
});

AI SDK chat interface

Real-World Use Cases

Customer Support Chatbot

Build a customer support chatbot that uses RAG to answer questions from your knowledge base. The AI SDK handles streaming responses, tool calling for knowledge base search, and structured output for ticket creation. Use useChat with message persistence to maintain conversation history across page reloads, and implement guardrails to prevent the bot from answering off-topic questions.

Code Assistant

Create an AI code assistant that generates, reviews, and explains code. Use tool calling to execute code in a sandbox, search documentation, and interact with your codebase. The streaming UI shows code generation in real-time. Add syntax highlighting with a library like Shiki or Prism, and implement diff views to show code changes clearly.

Content Generation Platform

Build a platform for generating blog posts, social media content, and marketing copy. Use structured output to ensure generated content matches your brand guidelines, and streaming to show content as it's generated. Implement generateObject for metadata extraction (tags, summaries, SEO titles) and streamText for the main content body.

Data Analysis Dashboard

Create a natural language interface for data analysis. Users ask questions about their data, and the AI generates SQL queries, runs them, and presents results with visualizations — all streamed in real-time. Use tool calling to expose database query functions, chart generation, and data export as available actions for the LLM.

Build semantic search that understands user intent, not just keywords. Use the SDK's embed function to vectorize your documents, store embeddings in a vector database (Pinecone, pgvector, Upstash), and use generateText with retrieved context to produce natural language answers instead of raw search results.

Best Practices for Production

  1. Use streaming by default — Streaming provides a dramatically better user experience. Users see output immediately rather than waiting for the complete response. Time-to-first-token is the most impactful UX metric for AI applications.

  2. Implement proper error handling — Handle network errors, rate limits, and model errors gracefully. Show user-friendly error messages and provide retry options. Use try/catch around your AI functions and implement exponential backoff for transient failures.

  3. Keep API keys server-side — Never expose API keys in client code. Use Server Actions or Route Handlers to keep keys on the server. The SDK's architecture naturally enforces this by requiring server-side model initialization.

  4. Use structured output for data — When you need typed data from the LLM, use generateObject with Zod schemas instead of parsing text responses. This eliminates JSON parsing errors and provides compile-time type safety.

  5. Set appropriate timeouts — AI requests can take 10-30 seconds depending on the model and prompt complexity. Configure maxDuration on your Route Handlers and set client-side timeouts to avoid premature failures.

  6. Implement rate limiting — Protect your API endpoints from abuse. Rate limit by user, IP, or API key to control costs. Use middleware or a service like Upstash Ratelimit for serverless-friendly rate limiting.

  7. Cache common responses — For frequently asked questions or repeated queries, cache responses to reduce API costs and latency. Use semantic caching (matching similar queries) for maximum effectiveness.

  8. Monitor costs and usage — Track token usage, API costs, and response times using the onFinish callback which provides token counts. Set up alerts for unusual usage patterns to catch runaway costs early.

  9. Choose the right model for each task — Use smaller, faster models (GPT-4o-mini, Haiku) for simple tasks like classification or extraction. Reserve larger models for complex reasoning, code generation, or multi-step analysis. This can reduce costs by 80-90% without sacrificing quality.

  10. Handle abort signals — Implement cancellation for when users navigate away or stop generation. The SDK's hooks automatically handle this via the stop function, but you should also implement server-side cleanup.

Common Pitfalls and Solutions

PitfallImpactSolution
Exposing API keys in client codeSecurity breach, unauthorized usageKeep keys server-side, use Server Actions
No error handlingBroken UX on API failuresImplement error boundaries and retry logic
Ignoring streamingPoor perceived performanceUse streamText and toDataStreamResponse
No rate limitingCost overruns, abuseImplement rate limiting per user/IP
Wrong model for the taskPoor quality or high costMatch model to task complexity
No timeout handlingHanging requestsSet appropriate maxDuration
Ignoring token limitsTruncated responsesMonitor and manage context window size
No message persistenceLost conversations on reloadStore messages in a database
Sending full history every requestGrowing costs, latencyImplement message windowing or summarization

Debugging AI SDK Issues

When issues arise, check these common sources: API key configuration, model availability, message format compatibility, and network connectivity. Enable verbose logging by passing experimental_telemetry to your AI functions to see the full request/response cycle, token usage, and latency breakdown.

Performance Optimization

Optimize AI SDK performance by choosing the right model for each task (use smaller, faster models for simple tasks), implementing response caching for repeated queries, and using streaming to reduce perceived latency. Use the Edge Runtime for lower cold-start times on Vercel, and implement connection pooling for database-backed applications.

For high-traffic applications, implement request queuing and connection pooling. Use the SDK's built-in abort functionality to cancel unnecessary requests when users navigate away. Consider implementing a token budget system that tracks usage per user and switches to cheaper models when approaching limits.

Comparison with Alternatives

FeatureVercel AI SDKLangChain.jsCustom Integration
Provider Abstraction★★★★★★★★★★★★
React Integration★★★★★★★★★★
Streaming Support★★★★★★★★★★★★
Type Safety★★★★★★★★★★★★
Learning CurveLowMediumHigh
Bundle SizeSmallLargeMinimal
Best ForReact/Next.js appsComplex chainsFull control

LangChain.js offers more pre-built chains and integrations for complex AI workflows, but its larger bundle size and steeper learning curve make it less ideal for straightforward React applications. Custom integrations give maximum control but require significant boilerplate for streaming, error handling, and provider abstraction. The AI SDK strikes the best balance for most React and Next.js projects.

Advanced Patterns

Multi-Agent Orchestration

Use the AI SDK to orchestrate multiple specialized agents. Each agent handles a specific domain (search, code, analysis), and a router agent decides which to invoke based on the user's request. The maxSteps parameter in streamText enables multi-step reasoning where the LLM can call tools, evaluate results, and decide on next actions autonomously.

const result = streamText({
  model: openai('gpt-4o'),
  system: `You are a router agent. Analyze the user's request and use the appropriate tool.`,
  tools: {
    researchAgent: tool({ /* ... */ }),
    codeAgent: tool({ /* ... */ }),
    dataAgent: tool({ /* ... */ }),
  },
  maxSteps: 10, // Allow complex multi-step reasoning
  messages,
});

Real-Time Collaboration

Build collaborative AI experiences where multiple users interact with the same AI session. Use the SDK's streaming capabilities with WebSocket or Server-Sent Events for real-time updates. Combine useChat with a shared state store (like Liveblocks or PartyKit) to synchronize messages across clients.

Custom UI Components

Build custom UI components that render AI-specific content: code blocks with syntax highlighting, interactive charts from structured data, image galleries from generated images, and streaming markdown with progressive rendering. Use the SDK's message.parts array to render different content types (text, tool calls, tool results) with specialized components.

Future Outlook

The Vercel AI SDK is evolving toward full-stack AI development — covering not just chat and text generation but also embeddings, fine-tuning, evaluation, and deployment. The goal is a single toolkit that covers every aspect of building AI-powered applications. Recent additions include the ToolLoopAgent for autonomous agent workflows, improved RAG utilities, and better support for multi-modal inputs.

The most significant trend is AI-native UI patterns — interfaces designed specifically for AI interaction rather than adapting traditional UI patterns. This includes streaming-first layouts, progressive disclosure of AI reasoning, and interactive tool results that users can explore and modify. The SDK's architecture makes it straightforward to implement these patterns without fighting the framework.

Conclusion

The Vercel AI SDK is the most productive way to build AI-powered React applications. Its unified provider abstraction, streaming-first design, and deep React integration eliminate the boilerplate and complexity of LLM integration, letting you focus on building great user experiences.

Key takeaways:

  1. The AI SDK abstracts LLM provider differences behind a unified, type-safe API
  2. Streaming is built-in and should be used by default for better UX
  3. Use useChat for conversational interfaces, useCompletion for text generation
  4. Implement tool calling with the tools parameter and Zod schemas
  5. Use generateObject for structured output with guaranteed schema compliance
  6. Keep API keys server-side using Server Actions or Route Handlers
  7. Match models to tasks — use smaller models for simple operations, larger for complex reasoning

Start by building a simple chat interface using useChat and a Route Handler. Once comfortable, add tool calling for interactive capabilities and structured output for data extraction. The AI SDK's incremental adoption path means you can start simple and add complexity as needed.