AI Agents: Building Autonomous Systems with LLMs

Introduction

The evolution from simple chatbots to autonomous AI agents represents one of the most significant shifts in software architecture since the advent of the internet. While a chatbot responds to individual messages, an AI agent can reason about complex goals, break them into sub-tasks, execute actions in the real world, learn from the results, and iteratively refine its approach until the goal is achieved. This capability transforms LLMs from text generators into genuine problem-solving systems.

The building blocks of AI agents — tool use, planning, memory, and multi-agent collaboration — have been refined through years of research and production deployment. Companies like OpenAI, Anthropic, Google, and a vibrant open-source community have created frameworks and APIs that make it possible for individual developers to build agents that would have required a research lab just a few years ago.

This guide provides a comprehensive technical deep dive into the architecture of AI agents. You will learn how agents reason and plan, how they use tools to interact with external systems, how they maintain memory across interactions, and how multiple agents can collaborate to solve problems that exceed any single agent's capabilities. Every concept is accompanied by practical TypeScript and Python code examples that you can adapt for your own projects.

What Is an AI Agent?

An AI agent is a system that uses an LLM as its reasoning engine to autonomously plan, execute, and iterate on tasks. Unlike a simple LLM application that maps input to output in a single step, an agent operates in a loop: it receives a goal, reasons about how to achieve it, takes actions (including calling tools), observes the results, and adjusts its plan accordingly.

The key distinction between an agent and a traditional application is agency — the ability to make decisions about what to do next based on the current state of the world. A weather app fetches data and displays it. A weather agent might check the forecast, compare it to your calendar, suggest rescheduling an outdoor meeting, draft an email to attendees, and ask for your approval before sending.

The concept of AI agents predates LLMs by decades. Classical AI agents were built on rule-based systems, expert systems, and planning algorithms. What has changed is the reasoning capability: LLMs provide a general-purpose reasoning engine that can handle ambiguity, natural language, and novel situations in ways that rule-based systems never could.

Core Concepts and Architecture

The Agent Loop

Every AI agent implements some version of the agent loop — a cycle of perception, reasoning, action, and observation. The loop continues until the agent determines it has achieved its goal or exhausted its options.

// Core agent loop implementation
interface AgentConfig {
  model: string;
  systemPrompt: string;
  tools: Tool[];
  maxIterations: number;
  temperature: number;
}
 
interface Tool {
  name: string;
  description: string;
  parameters: Record<string, unknown>;
  execute: (args: Record<string, unknown>) => Promise<string>;
}
 
interface AgentStep {
  thought: string;
  action?: { tool: string; input: Record<string, unknown> };
  observation?: string;
}
 
class AIAgent {
  private config: AgentConfig;
  private history: AgentStep[] = [];
  private llm: LLMClient;
 
  constructor(config: AgentConfig) {
    this.config = config;
    this.llm = new LLMClient({ model: config.model, temperature: config.temperature });
  }
 
  async run(goal: string): Promise<string> {
    const messages: Message[] = [
      { role: 'system', content: this.config.systemPrompt },
      { role: 'user', content: goal },
    ];
 
    for (let i = 0; i < this.config.maxIterations; i++) {
      // Step 1: Reason about what to do
      const response = await this.llm.chat(messages, this.getToolSchemas());
 
      if (response.toolCalls && response.toolCalls.length > 0) {
        // Step 2: Execute tool calls
        for (const toolCall of response.toolCalls) {
          const tool = this.config.tools.find(t => t.name === toolCall.name);
          if (!tool) {
            messages.push({ role: 'tool', content: `Error: Tool ${toolCall.name} not found`, toolCallId: toolCall.id });
            continue;
          }
 
          const result = await tool.execute(toolCall.arguments);
          this.history.push({
            thought: response.content,
            action: { tool: toolCall.name, input: toolCall.arguments },
            observation: result,
          });
 
          messages.push({ role: 'assistant', content: response.content, toolCalls: [toolCall] });
          messages.push({ role: 'tool', content: result, toolCallId: toolCall.id });
        }
      } else {
        // Step 3: No tool calls = final answer
        return response.content;
      }
    }
 
    return 'Maximum iterations reached. Could not complete the task.';
  }
 
  private getToolSchemas(): ToolSchema[] {
    return this.config.tools.map(tool => ({
      name: tool.name,
      description: tool.description,
      parameters: tool.parameters,
    }));
  }
}

Tool Use: Giving Agents Superpowers

Tool use is what transforms an LLM from a text generator into an agent that can act on the world. Tools are functions that the agent can invoke to search the web, query databases, execute code, send emails, manage files, or perform any other action.

The key to effective tool use is clear tool descriptions. The LLM decides which tool to use based on the tool's name and description, so these must be precise and unambiguous. A poorly described tool will be misused or ignored.

// Defining tools for an AI agent
const searchTool: Tool = {
  name: 'web_search',
  description: 'Search the web for current information. Use this when you need up-to-date facts, news, or data that may not be in your training data.',
  parameters: {
    type: 'object',
    properties: {
      query: { type: 'string', description: 'The search query' },
      numResults: { type: 'number', description: 'Number of results to return (1-10)', default: 5 },
    },
    required: ['query'],
  },
  execute: async (args) => {
    const response = await fetch(`https://api.search.example.com/search`, {
      method: 'POST',
      headers: { 'Authorization': `Bearer ${process.env.SEARCH_API_KEY}` },
      body: JSON.stringify({ query: args.query, limit: args.numResults || 5 }),
    });
    const results = await response.json();
    return JSON.stringify(results.items.map((r: any) => ({
      title: r.title,
      snippet: r.snippet,
      url: r.url,
    })));
  },
};
 
const calculatorTool: Tool = {
  name: 'calculator',
  description: 'Evaluate mathematical expressions. Supports arithmetic, algebra, and basic calculus operations.',
  parameters: {
    type: 'object',
    properties: {
      expression: { type: 'string', description: 'The mathematical expression to evaluate' },
    },
    required: ['expression'],
  },
  execute: async (args) => {
    try {
      const result = evaluateMathExpression(args.expression as string);
      return `Result: ${result}`;
    } catch (error) {
      return `Error evaluating expression: ${error.message}`;
    }
  },
};
 
const codeExecutionTool: Tool = {
  name: 'execute_code',
  description: 'Execute Python code in a sandboxed environment. Use for data analysis, calculations, and transformations that are complex to express as math.',
  parameters: {
    type: 'object',
    properties: {
      code: { type: 'string', description: 'The Python code to execute' },
    },
    required: ['code'],
  },
  execute: async (args) => {
    const result = await runInSandbox(args.code as string);
    return `Output:\n${result.stdout}\n${result.stderr ? `Errors:\n${result.stderr}` : ''}`;
  },
};

How AI Agents Work Under the Hood

The ReAct Pattern

The ReAct (Reasoning and Acting) pattern is the foundational agent architecture. The agent alternates between reasoning (generating thoughts about what to do) and acting (executing tools to gather information or perform actions). Each observation feeds back into the next reasoning step.

// ReAct pattern implementation
async function reactAgent(goal: string, tools: Tool[]): Promise<string> {
  const scratchpad: string[] = [];
  const maxSteps = 10;
 
  for (let step = 0; step < maxSteps; step++) {
    const prompt = `
Goal: ${goal}
 
Previous steps:
${scratchpad.join('\n')}
 
Available tools: ${tools.map(t => `${t.name}: ${t.description}`).join('\n')}
 
Think step by step about what to do next. Then either:
1. Use a tool by responding with: Thought: ... Action: tool_name(input)
2. Provide your final answer by responding with: Thought: ... Answer: ...
`;
 
    const response = await llm.complete(prompt);
    scratchpad.push(response);
 
    const answerMatch = response.match(/Answer:\s*(.*)/s);
    if (answerMatch) return answerMatch[1].trim();
 
    const actionMatch = response.match(/Action:\s*(\w+)\((.*?)\)/);
    if (actionMatch) {
      const toolName = actionMatch[1];
      const toolInput = actionMatch[2];
      const tool = tools.find(t => t.name === toolName);
 
      if (tool) {
        const result = await tool.execute({ input: toolInput });
        scratchpad.push(`Observation: ${result}`);
      } else {
        scratchpad.push(`Observation: Error - tool "${toolName}" not found`);
      }
    }
  }
 
  return 'Could not complete the task within the step limit.';
}

The Plan-and-Execute Pattern

For complex tasks, planning before acting is more effective than the reactive ReAct approach. The Plan-and-Execute pattern separates planning (creating a high-level strategy) from execution (implementing each step), with the ability to re-plan when observations reveal new information.

// Plan-and-Execute agent
interface Plan {
  goal: string;
  steps: PlanStep[];
  currentStep: number;
}
 
interface PlanStep {
  id: number;
  description: string;
  status: 'pending' | 'in-progress' | 'completed' | 'failed';
  result?: string;
}
 
class PlanAndExecuteAgent {
  private planner: LLMClient;
  private executor: LLMClient;
  private tools: Tool[];
 
  constructor(tools: Tool[]) {
    this.planner = new LLMClient({ model: 'gpt-4', temperature: 0.2 });
    this.executor = new LLMClient({ model: 'gpt-4', temperature: 0 });
    this.tools = tools;
  }
 
  async run(goal: string): Promise<string> {
    // Phase 1: Create a plan
    const plan = await this.createPlan(goal);
    console.log('Plan:', plan.steps.map(s => s.description));
 
    // Phase 2: Execute each step
    for (const step of plan.steps) {
      step.status = 'in-progress';
      try {
        const result = await this.executeStep(step, plan);
        step.result = result;
        step.status = 'completed';
 
        // Phase 3: Re-plan if needed
        const needsReplan = await this.shouldReplan(plan, step);
        if (needsReplan) {
          const newPlan = await this.replan(plan, step);
          plan.steps = newPlan.steps;
          plan.currentStep = 0;
        }
      } catch (error) {
        step.status = 'failed';
        step.result = `Failed: ${error.message}`;
        // Re-plan after failure
        const newPlan = await this.replan(plan, step);
        plan.steps = newPlan.steps;
        plan.currentStep = 0;
      }
    }
 
    // Phase 4: Synthesize final answer
    return this.synthesize(plan);
  }
 
  private async createPlan(goal: string): Promise<Plan> {
    const response = await this.planner.complete(`
      Create a step-by-step plan to achieve this goal: ${goal}
 
      Available tools: ${this.tools.map(t => `${t.name}: ${t.description}`).join('\n')}
 
      Respond with a JSON array of steps: [{"description": "..."}]
    `);
    const steps = JSON.parse(response).map((s: any, i: number) => ({
      id: i + 1,
      description: s.description,
      status: 'pending' as const,
    }));
    return { goal, steps, currentStep: 0 };
  }
 
  private async executeStep(step: PlanStep, plan: Plan): Promise<string> {
    const completedSteps = plan.steps
      .filter(s => s.status === 'completed')
      .map(s => `${s.description}: ${s.result}`)
      .join('\n');
 
    const result = await this.executor.complete(`
      Goal: ${plan.goal}
      Completed steps so far:
      ${completedSteps}
 
      Current step: ${step.description}
      Available tools: ${this.tools.map(t => `${t.name}: ${t.description}`).join('\n')}
 
      Execute this step. Use tools as needed. Provide the result.
    `);
    return result;
  }
}

The Reflection Pattern

Reflection adds a self-evaluation step to the agent loop. After generating output, the agent reviews its own work, identifies improvements, and iterates until it is satisfied with the quality.

// Reflection agent
async function reflectionAgent(task: string, maxReflections: number = 3): Promise<string> {
  let output = await generateInitialResponse(task);
 
  for (let i = 0; i < maxReflections; i++) {
    const critique = await llm.complete(`
      Task: ${task}
      Current output: ${output}
 
      Critically evaluate this output. Identify specific issues:
      - Factual errors
      - Missing information
      - Logical inconsistencies
      - Areas that could be improved
 
      Respond with a list of specific improvements needed.
    `);
 
    const needsImprovement = !critique.toLowerCase().includes('no improvements needed');
 
    if (!needsImprovement) break;
 
    output = await llm.complete(`
      Task: ${task}
      Current output: ${output}
      Feedback: ${critique}
 
      Improve the output based on the feedback. Address every issue raised.
    `);
  }
 
  return output;
}

Practical Implementation Guide

Building a Complete Research Agent with LangChain

# Full research agent implementation
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain.tools import Tool
from langchain_community.tools import DuckDuckGoSearchRun
from langchain_community.utilities import WikipediaAPIWrapper
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
 
# Initialize tools
search = DuckDuckGoSearchRun()
wikipedia = WikipediaAPIWrapper()
 
tools = [
    Tool(name="search", func=search.run, description="Search the web for current information"),
    Tool(name="wikipedia", func=wikipedia.run, description="Search Wikipedia for encyclopedic information"),
]
 
# Create the agent
llm = ChatOpenAI(model="gpt-4-turbo", temperature=0)
 
prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a thorough research agent. Your goal is to gather comprehensive,
    accurate information on any topic. Follow these principles:
    1. Search multiple sources to verify facts
    2. Note conflicting information and explain the discrepancy
    3. Always cite your sources
    4. Distinguish between established facts and speculation
    5. Provide structured, well-organized findings"""),
    MessagesPlaceholder(variable_name="chat_history", optional=True),
    ("human", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad"),
])
 
agent = create_openai_tools_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True, max_iterations=15)
 
# Use the agent
result = executor.invoke({
    "input": "Research the current state of quantum computing in 2025. Include recent breakthroughs, key players, and remaining challenges."
})

Building a Task-Planning Agent in TypeScript

import { ChatOpenAI } from '@langchain/openai';
import { AgentExecutor, createOpenAIFunctionsAgent } from 'langchain/agents';
import { ChatPromptTemplate, MessagesPlaceholder } from '@langchain/core/prompts';
import { DynamicStructuredTool } from '@langchain/core/tools';
import { z } from 'zod';
 
// Define tools with Zod schemas for validation
const taskPlannerTool = new DynamicStructuredTool({
  name: 'create_task_plan',
  description: 'Create a detailed task plan with dependencies and time estimates',
  schema: z.object({
    tasks: z.array(z.object({
      name: z.string().describe('Task name'),
      description: z.string().describe('Detailed task description'),
      estimatedHours: z.number().describe('Estimated hours to complete'),
      dependencies: z.array(z.string()).describe('Names of tasks that must complete first'),
      assignee: z.string().optional().describe('Person assigned to the task'),
    })),
  }),
  async func({ tasks }) {
    // Store tasks in database or project management tool
    const plan = await saveTaskPlan(tasks);
    return JSON.stringify(plan, null, 2);
  },
});
 
const calendarTool = new DynamicStructuredTool({
  name: 'check_calendar',
  description: 'Check team member availability and calendar events',
  schema: z.object({
    teamMember: z.string(),
    startDate: z.string(),
    endDate: z.string(),
  }),
  async func({ teamMember, startDate, endDate }) {
    const availability = await getCalendarEvents(teamMember, startDate, endDate);
    return JSON.stringify(availability);
  },
});
 
// Create the agent
const llm = new ChatOpenAI({ model: 'gpt-4-turbo', temperature: 0.2 });
const tools = [taskPlannerTool, calendarTool];
 
const prompt = ChatPromptTemplate.fromMessages([
  ['system', `You are a project planning agent. When given a project description,
   break it into actionable tasks, check team availability, and create a realistic plan.
   Consider dependencies, skill requirements, and deadlines.`],
  new MessagesPlaceholder('chat_history'),
  ['human', '{input}'],
  new MessagesPlaceholder('agent_scratchpad'),
]);
 
const agent = await createOpenAIFunctionsAgent({ llm, tools, prompt });
const executor = new AgentExecutor({ agent, tools, verbose: true, maxIterations: 10 });
 
// Run the planning agent
const result = await executor.invoke({
  input: `Create a project plan for building a new e-commerce checkout flow.
          The team has 3 frontend developers, 2 backend developers, and 1 QA engineer.
          The deadline is 6 weeks from now.`,
});

Real-World Use Cases

Customer Support Agents

AI agents can handle complex customer support scenarios that require accessing multiple systems: checking order status in the database, initiating returns in the logistics system, applying credits in the billing system, and drafting confirmation emails. The agent orchestrates these actions based on natural language conversation with the customer.

Code Generation and Review

Agents like GitHub Copilot Workspace and SWE-Agent can understand codebases, generate implementations, run tests, and fix issues autonomously. They use tools to read files, execute code, search documentation, and make edits — all guided by natural language instructions.

Data Analysis Pipelines

Data analysis agents can ingest raw datasets, perform exploratory analysis, generate visualizations, identify patterns, and produce reports. They use code execution tools to run Python/pandas, visualization libraries for charts, and file system tools to save outputs.

Research and Fact-Checking

Research agents can search the web, read articles, cross-reference sources, and compile findings into structured reports. They maintain memory of what they have already searched to avoid redundant queries and can follow citation chains to find primary sources.

Best Practices

1. Design clear, unambiguous tool descriptions. The LLM selects tools based on their descriptions. Vague descriptions lead to incorrect tool selection. Include what the tool does, when to use it, and what input it expects.

2. Implement guardrails for destructive actions. Actions like sending emails, making payments, or deleting data should require human approval. Use a confirmation step in your tool implementation for any action with irreversible consequences.

3. Use structured output for tool results. Return tool results as structured JSON rather than free text. This makes it easier for the LLM to parse and reason about the results, and enables downstream validation.

4. Set iteration limits and timeouts. Agents can enter loops where they keep trying the same approach. Set maximum iteration counts (typically 10-15) and per-step timeouts to prevent runaway processes.

5. Implement comprehensive logging. Log every LLM call, tool invocation, and decision. This creates an audit trail for debugging and enables you to understand why the agent made specific choices.

6. Start with the simplest architecture that works. A single agent with good tools is more reliable than a complex multi-agent system. Only add complexity when you have evidence that a simpler approach cannot handle your use case.

7. Test with adversarial inputs. Agents will encounter ambiguous, malicious, and out-of-scope inputs in production. Test extensively with edge cases: What happens when a tool fails? When the user asks for something dangerous? When the agent cannot find the information it needs?

Common Pitfalls and How to Avoid Them

Pitfall	Impact	Solution
Agent loops without exit conditions	Runaway costs, hung processes	Set max iterations and implement early stopping
Vague tool descriptions	Incorrect tool selection, wasted iterations	Write precise, example-rich descriptions
No error handling in tools	Agent crashes on tool failure	Wrap all tool calls in try/catch with meaningful error messages
Excessive context window usage	Increased costs, slower responses	Summarize old conversation turns, limit tool result sizes
Trusting LLM reasoning without verification	Incorrect actions based on hallucination	Implement verification steps for high-stakes decisions
No human-in-the-loop for critical actions	Unintended consequences	Require approval for destructive operations

The most critical pitfall is hallucination in the reasoning chain. An agent might "reason" that it has already completed a step when it has not, or misinterpret a tool result to reach a wrong conclusion. Mitigate this by requiring the agent to quote specific evidence for its claims and implementing verification checks at key decision points.

Another common issue is context window overflow. Long-running agents accumulate conversation history that eventually exceeds the model's context window. Implement a summarization strategy: when the context grows too large, summarize older interactions and keep only the most recent and relevant context.

Performance Considerations

Agent performance has two dimensions: latency (time to complete a task) and cost (tokens consumed). Each iteration of the agent loop requires at least one LLM call plus any tool executions. A 10-iteration agent might cost 10-50x more than a single LLM call and take 30-120 seconds to complete.

Optimize latency by using faster models (GPT-4o-mini, Claude Haiku) for simple subtasks and reserving powerful models (GPT-4, Claude Opus) for complex reasoning. Implement parallel tool execution when multiple independent tools need to be called. Cache tool results to avoid redundant API calls.

Optimize costs by implementing token budgets, using structured prompts that minimize unnecessary tokens, and setting appropriate max_tokens limits on LLM responses. Monitor cost per task and set alerts for anomalies.

Comparing Agent Architectures

Aspect	ReAct	Plan-and-Execute	Reflection	Multi-Agent
Complexity	Low	Medium	Low	High
Best for	Simple tasks	Complex, multi-step tasks	Quality-sensitive output	Diverse, parallel tasks
Latency	Low	Medium	Medium-High	High
Cost	Low	Medium	Medium	High
Reliability	Moderate	High	High	Variable
Error recovery	Reactive	Proactive (re-planning)	Self-correcting	Delegated

Advanced Topics

Agent Memory Systems

Long-term memory enables agents to learn from past interactions. Vector stores (like Pinecone, Weaviate, or ChromaDB) store embeddings of past interactions that can be retrieved based on semantic similarity to the current context.

// Long-term memory with vector store
class AgentMemory {
  private vectorStore: VectorStore;
 
  async remember(interaction: { input: string; output: string; feedback: string }) {
    const text = `Input: ${interaction.input}\nOutput: ${interaction.output}\nFeedback: ${interaction.feedback}`;
    const embedding = await this.embed(text);
    await this.vectorStore.upsert({ id: uuid(), vector: embedding, metadata: interaction });
  }
 
  async recall(query: string, topK: number = 5): Promise<MemoryEntry[]> {
    const embedding = await this.embed(query);
    return this.vectorStore.query({ vector: embedding, topK });
  }
 
  private async embed(text: string): Promise<number[]> {
    const response = await openai.embeddings.create({ model: 'text-embedding-3-small', input: text });
    return response.data[0].embedding;
  }
}

Agent Safety and Sandboxing

Agents that execute code or interact with external systems need sandboxing. Use container-based sandboxing (Docker, gVisor) for code execution, API key scoping for external service access, and action logging for audit trails.

Streaming and Real-Time Agents

For user-facing applications, streaming agent responses improves perceived latency. LangChain and the Vercel AI SDK support token-by-token streaming of agent responses, allowing users to see the agent's reasoning as it happens.

Conclusion

AI agents represent a fundamental shift in how we build software. Instead of writing explicit logic for every scenario, we define goals, provide tools, and let the LLM reason about how to achieve the goal. This paradigm enables software that can handle ambiguity, adapt to novel situations, and improve over time through reflection and memory.

The building blocks — tool use, planning, memory, and multi-agent collaboration — are mature enough for production use today. Start with a simple ReAct agent with well-defined tools, add planning for complex tasks, implement memory for persistent interactions, and scale to multi-agent systems when you need specialized expertise working in parallel.

The key to success is incremental complexity. Build the simplest agent that can solve your problem, test it thoroughly, and add sophistication only when you have evidence that it is needed. The most reliable agents are not the most complex — they are the ones with the best tools, the clearest prompts, and the most robust error handling. Start building today, and you will be amazed at what these systems can accomplish.

Minh Vo

Slaying code & making it lit fr fr 🔥 tagline