MinhVo

Minh Vo

rss feed

Slaying code & making it lit fr fr 🔥 tagline

Hey there 👋 I'm an AI Engineer with 7 years of experience building scalable web and mobile applications. Currently at Neurond AI (May 2025 — present), architecting an Enterprise AI Assistant Platform with multi-tenant RAG on pgvector, multi-provider LLM orchestration, and Azure-native infrastructure. Previously spent 5+ years at SNAPTEC (Sep 2019 — Apr 2025), leading SaaS themes, admin dashboards, and e-commerce platforms — earned the Hero of the Year award in 2021. I specialize in TypeScript, React, Next.js, and AI-Native engineering with Claude Code and Cursor.bio

Back to blogs

AI Function Calling: Structured Output from LLMs

Extract structured data from LLMs: JSON schemas, Pydantic models, and Zod validation.

AIFunction CallingStructured OutputLLM

By MinhVo

Introduction

One of the most persistent challenges in building production LLM applications is getting reliable, structured output. While models are excellent at generating natural language, most real-world applications need JSON objects, database records, API payloads, or typed data structures — not prose. Structured output solves this by constraining the model's generation to produce valid, schema-compliant JSON every time. Combined with function calling, structured output enables LLMs to return data that your application can directly parse, validate, and act upon without fragile regex parsing or string manipulation.

Structured data extraction from language models

The evolution from free-text responses to structured output represents a maturation of LLM applications. Early chatbots returned paragraphs that developers had to parse with regular expressions and heuristics. Modern applications use structured output to extract typed data with guaranteed schemas — addresses, order details, sentiment classifications, entity extractions — that flow directly into databases, APIs, and UI components without manual parsing.

OpenAI introduced JSON Mode in late 2023, followed by Structured Outputs with strict schema compliance in August 2024. Anthropic, Google, and open-source inference servers have followed with their own implementations. Today, structured output is a first-class feature in every major LLM API, and understanding how to use it effectively is essential for building reliable AI applications.

Understanding Structured Output: Core Concepts

Schema-Constrained Generation

Structured output works by constraining the model's token generation to only produce tokens that are valid according to a JSON Schema. During generation, the model's probability distribution is filtered to zero out tokens that would violate the schema — closing a bracket prematurely, inserting a string where a number is expected, or omitting a required field. This produces guaranteed-valid JSON on every request.

JSON Schema as the Contract

The JSON Schema standard defines the structure, types, constraints, and required fields for your data. The model generates output that conforms to this schema exactly. You define the schema once, and the model consistently produces matching output — no prompt engineering tricks needed.

Response Format vs. Function Calling

Structured output comes in two flavors: response format constrains the entire model response to a schema (for pure data extraction tasks), while function calling uses schemas to define tool parameters (for agent-like behavior). Both use the same underlying technology — JSON Schema validation during generation — but serve different use cases.

Type Safety End-to-End

The real power of structured output emerges when you connect schema definitions across the stack: define your data model in TypeScript/Python, generate the JSON Schema automatically, send it to the model, validate the response against the schema, and use the typed result in your application. This creates a type-safe pipeline from model output to application logic.

Schema-driven architecture

Architecture and Design Patterns

The Schema-First Pattern

Define your data schema in your application code (using Zod, Pydantic, or TypeScript interfaces) and derive the JSON Schema from it. This ensures your model output always matches your application's type system.

The Extraction Pattern

Use structured output to extract specific entities from unstructured text: names, dates, addresses, amounts, classifications. Define the extraction schema and let the model fill in the values.

The Classification Pattern

Define an enum of possible classifications and constrain the model to return one of them. This produces more reliable classifications than asking the model to respond with free text that you then parse.

The Multi-Field Pattern

Extract multiple related fields from a single piece of text. For example, from a product review, extract sentiment, key topics, rating, and summary — all in a single structured response.

Step-by-Step Implementation

OpenAI Structured Outputs with Zod (TypeScript)

import OpenAI from 'openai';
import { zodResponseFormat } from 'openai/helpers/zod';
import { z } from 'zod';
 
const openai = new OpenAI();
 
// Define your schema with Zod
const ContactInfo = z.object({
  name: z.string().describe('Full name of the person'),
  email: z.string().email().describe('Email address'),
  phone: z.string().describe('Phone number in any format'),
  company: z.string().describe('Company or organization name'),
  role: z.string().describe('Job title or role'),
  confidence: z.number().min(0).max(1).describe('Confidence score for extraction accuracy'),
});
 
type Contact = z.infer<typeof ContactInfo>;
 
async function extractContact(text: string): Promise<Contact> {
  const response = await openai.beta.chat.completions.parse({
    model: 'gpt-4o-2024-08-06',
    messages: [
      {
        role: 'system',
        content: 'Extract contact information from the provided text. If information is missing, use empty strings.',
      },
      { role: 'user', content: text },
    ],
    response_format: zodResponseFormat(ContactInfo, 'contact'),
  });
 
  return response.choices[0].message.parsed!;
}
 
// Usage
const contact = await extractContact(
  "Hi, I'm John Smith from Acme Corp. You can reach me at john@acme.com or 555-0123. I'm the VP of Engineering."
);
// { name: "John Smith", email: "john@acme.com", phone: "555-0123", 
//   company: "Acme Corp", role: "VP of Engineering", confidence: 0.95 }

Pydantic Models with OpenAI (Python)

from pydantic import BaseModel, Field
from openai import OpenAI
from typing import Literal
 
client = OpenAI()
 
class SentimentAnalysis(BaseModel):
    sentiment: Literal["positive", "negative", "neutral", "mixed"] = Field(
        description="Overall sentiment of the text"
    )
    confidence: float = Field(ge=0, le=1, description="Confidence score")
    key_positive_points: list[str] = Field(
        default_factory=list, description="Positive aspects mentioned"
    )
    key_negative_points: list[str] = Field(
        default_factory=list, description="Negative aspects mentioned"
    )
    summary: str = Field(description="One-sentence summary of the review")
 
def analyze_sentiment(review: str) -> SentimentAnalysis:
    completion = client.beta.chat.completions.parse(
        model="gpt-4o-2024-08-06",
        messages=[
            {"role": "system", "content": "Analyze the sentiment of the product review."},
            {"role": "user", "content": review},
        ],
        response_format=SentimentAnalysis,
    )
    return completion.choices[0].message.parsed
 
# Usage
result = analyze_sentiment("This laptop is amazing for coding but the battery life is terrible.")
# SentimentAnalysis(
#   sentiment='mixed', confidence=0.9,
#   key_positive_points=['amazing for coding'],
#   key_negative_points=['terrible battery life'],
#   summary='Great performance but poor battery life.'
# )

Anthropic Structured Output with Tool Use

Anthropic's approach to structured output uses tool use (function calling) rather than a dedicated response format parameter. Define a tool with your desired schema and force the model to call it:

import anthropic
import json
 
client = anthropic.Anthropic()
 
def extract_entities(text: str) -> dict:
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        tools=[{
            "name": "extract_entities",
            "description": "Extract named entities from text",
            "input_schema": {
                "type": "object",
                "properties": {
                    "people": {
                        "type": "array",
                        "items": {"type": "object", "properties": {
                            "name": {"type": "string"},
                            "role": {"type": "string"},
                            "organization": {"type": "string"},
                        }},
                    },
                    "locations": {"type": "array", "items": {"type": "string"}},
                    "dates": {"type": "array", "items": {"type": "string"}},
                    "organizations": {"type": "array", "items": {"type": "string"}},
                },
                "required": ["people", "locations", "dates", "organizations"],
            },
        }],
        tool_choice={"type": "tool", "name": "extract_entities"},
        messages=[{"role": "user", "content": f"Extract entities from: {text}"}],
    )
 
    # The response contains a tool_use block with the structured data
    for block in response.content:
        if block.type == "tool_use":
            return block.input
    return {}

Google Gemini Structured Output

Google's Gemini API supports structured output through the response_mime_type and response_schema parameters:

import google.generativeai as genai
 
genai.configure(api_key="YOUR_API_KEY")
 
schema = {
    "type": "object",
    "properties": {
        "title": {"type": "string"},
        "authors": {"type": "array", "items": {"type": "string"}},
        "summary": {"type": "string"},
        "key_findings": {"type": "array", "items": {"type": "string"}},
        "methodology": {"type": "string", "enum": ["experimental", "observational", "review", "meta-analysis"]},
        "confidence": {"type": "number"},
    },
    "required": ["title", "authors", "summary", "key_findings", "methodology"],
}
 
model = genai.GenerativeModel("gemini-pro")
response = model.generate_content(
    "Analyze this research paper abstract: ...",
    generation_config=genai.GenerationConfig(
        response_mime_type="application/json",
        response_schema=schema,
    ),
)
# response.text contains valid JSON matching the schema

Multi-Level Nested Schemas

const OrderAnalysis = z.object({
  orderId: z.string(),
  status: z.enum(['processing', 'shipped', 'delivered', 'cancelled']),
  customer: z.object({
    name: z.string(),
    email: z.string().email(),
    loyaltyTier: z.enum(['bronze', 'silver', 'gold', 'platinum']),
  }),
  items: z.array(z.object({
    productName: z.string(),
    quantity: z.number().int().positive(),
    unitPrice: z.number().positive(),
    subtotal: z.number(),
  })),
  totalAmount: z.number(),
  estimatedDelivery: z.string().describe('ISO date string'),
  notes: z.string().optional(),
});

Structured output pipeline

Real-World Use Cases

Entity Extraction from Documents

Extract structured data from contracts, invoices, resumes, and reports. Define schemas for each document type and let the model extract relevant fields with high accuracy. This replaces weeks of custom NLP development with a single API call.

API Response Generation

Generate mock API responses from OpenAPI specifications. Define the response schema and let the model produce realistic test data that conforms to the API contract.

Database Record Creation

Parse unstructured notes, emails, or chat transcripts into structured database records. A support ticket email becomes a typed ticket object with priority, category, affected product, and description fields.

Content Classification and Tagging

Classify articles, social media posts, or customer feedback into predefined categories with structured confidence scores and metadata. Use enum constraints to ensure the model only returns valid categories.

Best Practices for Production

  1. Use strict mode when available — OpenAI's strict mode guarantees 100% schema compliance. Non-strict mode may occasionally produce invalid JSON despite the schema.

  2. Add descriptions to every field — The model uses field descriptions to understand what data to generate. "ISO 8601 date string" is better than just "date" for a date field.

  3. Keep schemas focused — Extract one concept per request. A schema that tries to extract sentiment, entities, summary, and translation simultaneously will be less accurate than separate, focused requests.

  4. Use enums for bounded values — When the output should be one of a fixed set of values, use enums. This eliminates the model inventing new categories or using inconsistent terminology.

  5. Handle optional fields explicitly — Decide whether missing fields should be null, empty strings, or omitted. Consistent handling prevents downstream errors.

  6. Validate at the application layer — Even with structured output, add application-level validation for business rules (e.g., total amount matches sum of line items).

  7. Cache schema definitions — If you're sending the same schema repeatedly, cache the schema object to avoid re-serialization overhead on every request.

  8. Monitor extraction accuracy — Log structured outputs and periodically audit them for accuracy. Track confidence scores to identify cases where the model is uncertain.

Common Pitfalls and Solutions

PitfallImpactSolution
Overly complex schemasLower accuracy, slower generationSplit into multiple focused requests
Missing field descriptionsAmbiguous or incorrect valuesAdd detailed descriptions with examples
Using non-strict modeOccasional invalid JSONEnable strict mode or add JSON validation
Too many enum valuesModel selects wrong categoryGroup categories hierarchically
Ignoring confidence scoresActing on uncertain extractionsSet confidence thresholds for automated actions
Not handling null/empty fieldsNullPointerErrors downstreamDefine nullable fields explicitly in schema
Schema too permissiveInconsistent output formatAdd format constraints (email, date, regex)

Debugging Extraction Issues

When structured output doesn't match expectations, the issue is usually in the schema design or field descriptions, not the model. Add examples in field descriptions, simplify complex objects, and test with diverse inputs to identify edge cases.

Performance Optimization

Structured output adds slight overhead compared to free-text generation because the model must satisfy constraints at every token. For most applications, this overhead is negligible (under 100ms). Optimize by keeping schemas as small as possible — only include fields you actually need.

For high-throughput applications, batch extraction requests and use parallel processing. Extract independent fields in separate requests to enable concurrent execution.

async function batchExtract(texts: string[]): Promise<Contact[]> {
  return Promise.all(texts.map(text => extractContact(text)));
}

Comparison with Alternatives

ApproachReliabilityFlexibilityLatencyType SafetyBest For
Structured OutputVery HighMediumLowHighProduction data extraction
JSON ModeHighHighLowMediumGeneral JSON responses
Prompt + ParseMediumVery HighLowLowOne-off extractions
Function CallingHighHighMediumHighAgent workflows
Custom Fine-tuningVery HighLowVery LowHighHigh-volume, fixed tasks

Advanced Patterns

Discriminated Unions

Use discriminated unions to handle multiple response types from a single request:

const ResponseSchema = z.discriminatedUnion('type', [
  z.object({ type: z.literal('answer'), content: z.string() }),
  z.object({ type: z.literal('clarification'), question: z.string() }),
  z.object({ type: z.literal('tool_call'), tool: z.string(), args: z.record(z.string()) }),
]);

Recursive Schemas

For data with unknown depth (comment threads, file trees), use recursive schemas:

const CommentSchema: z.ZodType<Comment> = z.lazy(() => z.object({
  author: z.string(),
  text: z.string(),
  replies: z.array(CommentSchema).optional(),
}));

Progressive Extraction

For complex documents, extract progressively — first get the document type, then extract type-specific fields in a follow-up request. This improves accuracy by narrowing the model's focus at each step.

Retry Strategies for Production Reliability

Even with structured output, network errors and rate limits can cause failures. Implement exponential backoff with jitter and distinguish between retryable errors (rate limits, timeouts) and permanent errors (invalid schema, content policy violations):

async function extractWithRetry<T>(
  fn: () => Promise<T>,
  maxRetries = 3
): Promise<T> {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await fn();
    } catch (error) {
      if (attempt === maxRetries) throw error;
      if (error.status === 429 || error.status >= 500) {
        const delay = Math.min(1000 * 2 ** attempt, 10000);
        const jitter = Math.random() * 1000;
        await new Promise(resolve => setTimeout(resolve, delay + jitter));
        continue;
      }
      throw error; // Non-retryable error
    }
  }
  throw new Error('Unreachable');
}

Streaming with Structured Output

For real-time applications, combine streaming with structured output to progressively populate fields as the model generates them. OpenAI's streaming API emits partial JSON that you can parse incrementally:

const stream = await openai.beta.chat.completions.stream({
  model: 'gpt-4o-2024-08-06',
  messages: [{ role: 'user', content: text }],
  response_format: zodResponseFormat(ContactInfo, 'contact'),
});
 
for await (const event of stream) {
  const partial = event.choices[0]?.delta?.content;
  if (partial) {
    // Update UI with partial structured data
    updateUI(JSON.parse(partial));
  }
}

Testing Strategies

describe('Structured Output', () => {
  it('should extract valid contact info', async () => {
    const contact = await extractContact("John Doe, john@example.com");
    expect(contact.name).toBe('John Doe');
    expect(contact.email).toBe('john@example.com');
    expect(typeof contact.confidence).toBe('number');
  });
 
  it('should handle missing fields gracefully', async () => {
    const contact = await extractContact("Just a name: John");
    expect(contact.name).toBe('John');
    expect(contact.email).toBe(''); // Empty string for missing fields
  });
 
  it('should produce valid schema-compliant output', async () => {
    const result = await analyzeSentiment("Great product!");
    expect(['positive', 'negative', 'neutral', 'mixed']).toContain(result.sentiment);
    expect(result.confidence).toBeGreaterThanOrEqual(0);
    expect(result.confidence).toBeLessThanOrEqual(1);
  });
});

Future Outlook

Structured output is evolving toward automatic schema generation — where the model infers the appropriate schema from natural language descriptions of the desired output format. Instead of writing JSON Schema, you'll describe what you want ("extract the person's name, age, and occupation") and the model will generate the schema itself.

The convergence of structured output with real-time streaming will enable progressive data extraction where fields are populated as the model generates them. UI components can update in real-time as each field is extracted, providing instant feedback.

Multi-modal structured output — extracting structured data from images, audio, and video — will extend these capabilities beyond text. Imagine scanning a receipt image and getting a fully structured expense record, or transcribing a meeting and extracting action items with assignees and deadlines.

Conclusion

Structured output bridges the gap between LLM intelligence and application reliability. By constraining model output to predefined schemas, you get the intelligence of a language model with the reliability of a typed API — every response is valid, parseable, and ready for your application to use.

Key takeaways:

  1. Use structured output to get guaranteed-valid, schema-compliant JSON from LLMs
  2. Define schemas in your application code (Zod, Pydantic) and derive JSON Schema from them
  3. Add detailed field descriptions — the model uses them to generate accurate values
  4. Use enums for bounded values and strict mode for guaranteed compliance
  5. Keep schemas focused — extract one concept per request for highest accuracy
  6. Validate at the application layer for business rules beyond schema constraints
  7. Monitor extraction accuracy and confidence scores in production

Start by identifying one place in your application where you parse LLM text output with regex or string manipulation. Replace that fragile parsing with structured output and a clean schema. The improvement in reliability and code cleanliness will convince you to adopt it everywhere.