Introduction
The leap from chatbots to AI agents happened when large language models gained the ability to call external functions. Instead of just generating text, an LLM can now decide which tool to use, construct the right arguments, execute the tool, and incorporate the results into its response. This capability—called function calling or tool use—transforms LLMs from passive text generators into active participants in software systems.
Building AI agents with function calling is fundamentally different from traditional API integration. The agent doesn't follow a predetermined script; it reasons about which tools to use based on the user's request. This introduces challenges around tool selection, error handling, multi-step reasoning, and safety. This guide covers the architecture, implementation patterns, and production considerations for building reliable AI agents.
Understanding Function Calling: Core Concepts
How Function Calling Works
Function calling follows a three-step cycle:
-
Tool definition: You describe available tools to the LLM using a schema (name, description, parameters with types and constraints).
-
Tool selection: The LLM analyzes the user's request and decides whether to call a tool. If it does, it returns a structured tool call with the tool name and arguments.
-
Tool execution: Your application executes the tool, captures the result, and sends it back to the LLM. The LLM then incorporates the result into its response.
This cycle can repeat multiple times in a single conversation—the LLM might call several tools sequentially or in parallel to fulfill a complex request.
Tool Schemas
Tools are described using JSON Schema. A well-written tool schema is critical—the LLM uses the description and parameter descriptions to decide when and how to call the tool:
const tools = [
{
type: "function",
function: {
name: "get_weather",
description: "Get current weather for a location. Use this when the user asks about weather conditions, temperature, or forecast.",
parameters: {
type: "object",
properties: {
location: {
type: "string",
description: "City name, e.g., 'San Francisco, CA' or 'Tokyo, Japan'",
},
units: {
type: "string",
enum: ["celsius", "fahrenheit"],
description: "Temperature units. Defaults to celsius.",
},
},
required: ["location"],
},
},
},
];Multi-Step Reasoning
Complex tasks require multiple tool calls. For example, "What's the weather in Paris and convert 100 EUR to USD?" requires two independent tool calls. The LLM can issue both calls in a single response (parallel tool calls), or chain them sequentially if one depends on another.
The Agent Loop
An agent operates in a loop:
User message → LLM decides action → Execute tool(s) → Feed results to LLM →
LLM decides next action → ... → LLM generates final response
The loop terminates when the LLM generates a response without any tool calls, indicating it has enough information to answer.
Architecture and Design Patterns
Tool Registry Pattern
Centralize tool definitions in a registry that maps tool names to their implementations:
interface Tool {
name: string;
description: string;
parameters: Record<string, any>;
execute: (args: Record<string, any>) => Promise<string>;
}
class ToolRegistry {
private tools = new Map<string, Tool>();
register(tool: Tool) {
this.tools.set(tool.name, tool);
}
getSchema() {
return Array.from(this.tools.values()).map((tool) => ({
type: "function",
function: {
name: tool.name,
description: tool.description,
parameters: tool.parameters,
},
}));
}
async execute(name: string, args: Record<string, any>): Promise<string> {
const tool = this.tools.get(name);
if (!tool) throw new Error(`Unknown tool: ${name}`);
return tool.execute(args);
}
}Guardrails Pattern
Always validate tool arguments before execution. The LLM might generate malformed arguments, and malicious inputs could attempt prompt injection:
function withGuardrails(tool: Tool): Tool {
return {
...tool,
async execute(args) {
// Validate required fields
for (const [key, schema] of Object.entries(tool.parameters.properties)) {
if (schema.required && !(key in args)) {
return JSON.stringify({ error: `Missing required parameter: ${key}` });
}
}
// Sanitize string inputs
for (const [key, value] of Object.entries(args)) {
if (typeof value === "string") {
args[key] = value.replace(/[<>]/g, ""); // Basic XSS prevention
}
}
try {
return await tool.execute(args);
} catch (error) {
return JSON.stringify({ error: `Tool execution failed: ${error.message}` });
}
},
};
}Conversation Context Management
Agents need to manage conversation history to maintain context across multiple turns. This includes system prompts, user messages, assistant responses, and tool call results:
interface ConversationState {
messages: Message[];
toolCalls: ToolCall[];
totalTokens: number;
}
class ConversationManager {
private state: ConversationState;
private maxTokens: number;
constructor(maxTokens: number = 8000) {
this.state = { messages: [], toolCalls: [], totalTokens: 0 };
this.maxTokens = maxTokens;
}
addMessage(message: Message) {
this.state.messages.push(message);
this.state.totalTokens += this.estimateTokens(message);
// Trim old messages if context is too large
while (this.state.totalTokens > this.maxTokens && this.state.messages.length > 2) {
const removed = this.state.messages.splice(1, 1)[0]; // Keep system message
this.state.totalTokens -= this.estimateTokens(removed);
}
}
private estimateTokens(message: Message): number {
return Math.ceil(JSON.stringify(message).length / 4);
}
}Step-by-Step Implementation
Basic Agent with OpenAI
// agent.ts
import OpenAI from "openai";
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
interface AgentConfig {
model: string;
systemPrompt: string;
maxIterations: number;
tools: Tool[];
}
class Agent {
private config: AgentConfig;
private registry: ToolRegistry;
private messages: any[] = [];
constructor(config: AgentConfig) {
this.config = config;
this.registry = new ToolRegistry();
config.tools.forEach((tool) => this.registry.register(withGuardrails(tool)));
this.messages.push({
role: "system",
content: config.systemPrompt,
});
}
async run(userMessage: string): Promise<string> {
this.messages.push({ role: "user", content: userMessage });
for (let i = 0; i < this.config.maxIterations; i++) {
const response = await openai.chat.completions.create({
model: this.config.model,
messages: this.messages,
tools: this.registry.getSchema(),
tool_choice: "auto",
});
const choice = response.choices[0];
this.messages.push(choice.message);
// If no tool calls, return the text response
if (!choice.message.tool_calls || choice.message.tool_calls.length === 0) {
return choice.message.content || "";
}
// Execute all tool calls
for (const toolCall of choice.message.tool_calls) {
const args = JSON.parse(toolCall.function.arguments);
const result = await this.registry.execute(toolCall.function.name, args);
this.messages.push({
role: "tool",
tool_call_id: toolCall.id,
content: result,
});
}
}
return "I've reached the maximum number of steps for this task.";
}
}Building Real Tools
// tools/weather.ts
const weatherTool: Tool = {
name: "get_weather",
description: "Get current weather for a location",
parameters: {
type: "object",
properties: {
location: { type: "string", description: "City name" },
units: { type: "string", enum: ["celsius", "fahrenheit"] },
},
required: ["location"],
},
async execute(args) {
const { location, units = "celsius" } = args;
const response = await fetch(
`https://api.weatherapi.com/v1/current.json?key=${process.env.WEATHER_API_KEY}&q=${encodeURIComponent(location)}`
);
const data = await response.json();
return JSON.stringify({
location: data.location.name,
temperature: units === "fahrenheit" ? data.current.temp_f : data.current.temp_c,
condition: data.current.condition.text,
humidity: data.current.humidity,
wind: data.current.wind_kph,
});
},
};
// tools/database.ts
const databaseTool: Tool = {
name: "query_database",
description: "Query the product database. Use for product searches, inventory checks, and pricing.",
parameters: {
type: "object",
properties: {
query: { type: "string", description: "Natural language query about products" },
limit: { type: "number", description: "Max results to return (default 10)" },
},
required: ["query"],
},
async execute(args) {
const { query, limit = 10 } = args;
// Convert natural language to SQL using another LLM call
const sql = await naturalLanguageToSQL(query);
const results = await db.query(sql, { limit });
return JSON.stringify(results);
},
};
// tools/code_execution.ts
const codeExecutionTool: Tool = {
name: "execute_code",
description: "Execute JavaScript code in a sandboxed environment. Use for calculations, data transformations, and prototyping.",
parameters: {
type: "object",
properties: {
code: { type: "string", description: "JavaScript code to execute" },
},
required: ["code"],
},
async execute(args) {
const { code } = args;
// Use a sandboxed execution environment
const result = await runInSandbox(code, { timeout: 5000 });
return JSON.stringify({ output: result.stdout, error: result.stderr });
},
};Multi-Agent Orchestration
// orchestrator.ts
class AgentOrchestrator {
private agents: Map<string, Agent>;
constructor() {
this.agents = new Map();
}
registerAgent(name: string, agent: Agent) {
this.agents.set(name, agent);
}
async run(task: string): Promise<string> {
// Use a planner agent to determine which specialized agents to invoke
const planner = this.agents.get("planner")!;
const plan = await planner.run(`Analyze this task and create an execution plan: ${task}`);
// Parse the plan and invoke agents
const steps = JSON.parse(plan);
let context = task;
for (const step of steps) {
const agent = this.agents.get(step.agent);
if (!agent) throw new Error(`Unknown agent: ${step.agent}`);
context = await agent.run(`${context}\n\nPrevious result: ${context}`);
}
return context;
}
}
// Usage
const orchestrator = new AgentOrchestrator();
orchestrator.registerAgent("researcher", researchAgent);
orchestrator.registerAgent("writer", writerAgent);
orchestrator.registerAgent("reviewer", reviewerAgent);
const result = await orchestrator.run("Write a technical blog post about WebAssembly");Real-World Use Cases
Customer Support Agent
A SaaS company built a support agent that can look up account details, check subscription status, reset passwords, and escalate to human agents. The agent has 8 tools and handles 70% of support tickets autonomously. It uses guardrails to prevent the agent from modifying billing information without human approval.
Code Review Agent
A development team built a code review agent that reads pull requests, analyzes code quality, checks for security vulnerabilities, and posts review comments. The agent uses tools to read files, run linters, search the codebase for patterns, and post GitHub comments.
Data Analysis Agent
A data team built an agent that converts natural language questions into SQL queries, executes them, and generates visualizations. The agent has tools for querying databases, generating charts, and exporting reports. It includes guardrails that prevent destructive SQL operations (DROP, DELETE without WHERE).
Research Agent
A research team built an agent that searches the web, reads articles, extracts key information, and synthesizes findings into structured reports. The agent uses a planning step to break complex research questions into sub-questions, then dispatches them to specialized sub-agents.
Best Practices for Production
-
Write clear, specific tool descriptions — The LLM uses descriptions to decide when to call a tool. Vague descriptions lead to incorrect tool selection. Include examples of when to use and when not to use each tool.
-
Validate all tool arguments — Never trust LLM-generated arguments. Validate types, ranges, and formats before execution. Use JSON Schema validation libraries.
-
Implement timeout and retry logic — Tool calls can hang or fail. Set timeouts (5-10 seconds for API calls) and implement exponential backoff for transient failures.
-
Log every tool call — Record the tool name, arguments, result, and execution time. This is essential for debugging and auditing agent behavior.
-
Set iteration limits — Agents can get stuck in loops. Set a maximum number of tool call iterations per request and return a graceful fallback.
-
Use structured outputs — When the agent needs to return structured data, use JSON mode or function calling to enforce the schema, rather than parsing free-form text.
-
Implement human-in-the-loop for high-stakes actions — For actions that modify data, send money, or affect users, require human approval before execution.
-
Cache tool results — If a tool is called with the same arguments within a short time window, return the cached result instead of re-executing.
Common Pitfalls and Solutions
| Pitfall | Impact | Solution |
|---|---|---|
| Vague tool descriptions | Wrong tool selected | Write specific descriptions with examples and anti-examples |
| No argument validation | Injection attacks, errors | Validate all arguments with JSON Schema before execution |
| Infinite agent loops | Cost explosion, timeouts | Set max iterations and implement loop detection |
| Unbounded context | Token limit exceeded | Implement conversation trimming with sliding window |
| Tool execution errors crash agent | Poor user experience | Catch errors in tools and return structured error messages |
| No cost tracking | Budget overrun | Track token usage per request and set spending limits |
Performance Optimization
Parallel Tool Calls
When the LLM issues multiple independent tool calls, execute them in parallel:
// Execute tool calls in parallel
const toolResults = await Promise.all(
toolCalls.map(async (toolCall) => {
const args = JSON.parse(toolCall.function.arguments);
const result = await this.registry.execute(toolCall.function.name, args);
return {
role: "tool",
tool_call_id: toolCall.id,
content: result,
};
})
);Tool Result Caching
const toolCache = new Map<string, { result: string; expiry: number }>();
async function executeWithCache(tool: Tool, args: Record<string, any>): Promise<string> {
const cacheKey = `${tool.name}:${JSON.stringify(args)}`;
const cached = toolCache.get(cacheKey);
if (cached && cached.expiry > Date.now()) {
return cached.result;
}
const result = await tool.execute(args);
toolCache.set(cacheKey, { result, expiry: Date.now() + 60000 }); // 1 min cache
return result;
}Streaming Responses
Stream tool call results to the user for better perceived performance:
const stream = await openai.chat.completions.create({
model: "gpt-4",
messages,
tools: registry.getSchema(),
stream: true,
});
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta;
if (delta?.content) {
process.stdout.write(delta.content);
}
}Comparison with Alternatives
| Approach | Flexibility | Reliability | Cost | Complexity |
|---|---|---|---|---|
| Function calling | High | Medium | Medium | Medium |
| ReAct framework | Very high | Medium | High | High |
| Fixed workflow | Low | High | Low | Low |
| Hybrid (workflow + agent) | High | High | Medium | High |
Advanced Patterns
Self-Correction
When a tool call fails, the agent can retry with corrected arguments:
async function executeWithRetry(tool: Tool, args: any, maxRetries: number = 2): Promise<string> {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
return await tool.execute(args);
} catch (error) {
if (attempt === maxRetries) {
return JSON.stringify({ error: `Failed after ${maxRetries} attempts: ${error.message}` });
}
// The LLM will see the error and can adjust its approach
continue;
}
}
return JSON.stringify({ error: "Unexpected error" });
}Tool Composition
Build complex tools by composing simple ones:
const searchAndSummarizeTool: Tool = {
name: "search_and_summarize",
description: "Search for information and provide a summary",
parameters: {
type: "object",
properties: {
query: { type: "string" },
maxSources: { type: "number" },
},
required: ["query"],
},
async execute(args) {
const results = await searchTool.execute({ query: args.query, limit: args.maxSources || 3 });
const summaries = await Promise.all(
JSON.parse(results).map((r: any) => summarizeTool.execute({ url: r.url }))
);
return JSON.stringify(summaries);
},
};Testing Strategies
import { Agent } from "../agent";
import { MockToolRegistry } from "./mocks";
describe("Agent", () => {
it("calls the correct tool based on user input", async () => {
const registry = new MockToolRegistry();
const mockWeather = jest.fn().mockResolvedValue(JSON.stringify({ temp: 22 }));
registry.register({
name: "get_weather",
description: "Get weather",
parameters: { type: "object", properties: { location: { type: "string" } }, required: ["location"] },
execute: mockWeather,
});
const agent = new Agent({ model: "gpt-4", systemPrompt: "You are helpful.", maxIterations: 3, tools: registry.getAll() });
await agent.run("What's the weather in Tokyo?");
expect(mockWeather).toHaveBeenCalledWith(expect.objectContaining({ location: expect.stringContaining("Tokyo") }));
});
it("handles tool execution errors gracefully", async () => {
const registry = new MockToolRegistry();
registry.register({
name: "failing_tool",
description: "Always fails",
parameters: { type: "object", properties: {} },
execute: () => { throw new Error("Tool failed"); },
});
const agent = new Agent({ model: "gpt-4", systemPrompt: "You are helpful.", maxIterations: 3, tools: registry.getAll() });
const result = await agent.run("Use the failing tool");
// Agent should still return a response
expect(result).toBeTruthy();
expect(result).not.toContain("error"); // Should handle gracefully
});
it("respects iteration limits", async () => {
const agent = new Agent({
model: "gpt-4",
systemPrompt: "Keep calling tools forever.",
maxIterations: 3,
tools: [{ name: "noop", description: "Does nothing", parameters: { type: "object", properties: {} }, execute: async () => "ok" }],
});
const result = await agent.run("Call noop 100 times");
expect(result).toContain("maximum");
});
});Future Outlook
Function calling is evolving rapidly. OpenAI, Anthropic, and Google are all expanding their tool use APIs with features like parallel tool calls, streaming tool results, and structured outputs. The pattern is becoming standardized across providers, making it easier to build provider-agnostic agents.
Multi-agent frameworks (AutoGen, CrewAI, LangGraph) are maturing, providing orchestration patterns for complex workflows. These frameworks handle agent communication, state management, and error recovery, reducing the boilerplate needed for multi-agent systems.
The safety landscape is also evolving. As agents gain access to more powerful tools (code execution, file systems, external APIs), the need for robust guardrails, human oversight, and audit trails becomes critical. Expect standardized safety frameworks to emerge alongside the agent capabilities.
Conclusion
Building AI agents with function calling is one of the most impactful applications of LLMs. The key takeaways:
- Tool design is everything — Clear descriptions, proper parameter schemas, and robust error handling determine agent reliability
- The agent loop is simple but powerful — LLM decides → tool executes → result feeds back → repeat until done
- Guardrails are non-negotiable — Validate inputs, sanitize outputs, set iteration limits, and log everything
- Start with a single tool — Build one reliable tool, test it thoroughly, then expand the toolset
- Human-in-the-loop for high stakes — Require approval for actions that affect real systems or users
Begin by building an agent with one tool (e.g., a calculator or web search). Get the tool schema right, handle errors gracefully, and test with diverse inputs. Once that loop works reliably, adding more tools is straightforward. The hard part is never the code—it's designing tools that the LLM can use correctly and safely.