AI Coding Agents: Devin, SWE-Agent, and OpenHands

Introduction

The next frontier in AI-assisted development isn't better autocomplete — it's autonomous coding agents that can independently read issue descriptions, explore codebases, write implementations, run tests, and submit pull requests. Tools like Devin by Cognition Labs, SWE-Agent from Princeton, and OpenHands (formerly OpenDevin) represent a paradigm shift from AI as a coding assistant to AI as an autonomous developer. These systems don't wait for you to write a comment or press Tab — they receive a task in natural language and execute the full development workflow independently.

Autonomous AI coding agents transforming development

The results are striking. SWE-Agent achieved a 12.5% resolve rate on the SWE-bench benchmark — real-world GitHub issues from popular open-source projects. Devin made headlines by resolving 13.86% of issues end-to-end. OpenHands, the open-source alternative, reached comparable performance while being fully transparent and customizable. While these numbers might seem modest, they represent real bugs in real projects like Django, Flask, and scikit-learn — issues that experienced developers struggle with.

The implications for software engineering are profound and divisive. Optimists see a future where developers focus on architecture and design while agents handle implementation. Critics worry about code quality, security, and the loss of deep understanding that comes from writing code yourself. The reality, as always, lies somewhere in between. Understanding how these agents work, what they can and cannot do, and how to effectively collaborate with them is essential knowledge for any developer in 2025.

Understanding AI Coding Agents: Core Concepts

The Agent Loop

All coding agents follow a similar core loop: Observe → Think → Act → Repeat. The agent observes the current state of the codebase, thinks about what action to take (using an LLM), executes that action (editing files, running commands, browsing the web), observes the result, and repeats until the task is complete.

This loop is fundamentally different from code completion. A code completion model sees your cursor position and predicts the next token. An agent sees the entire task, plans a multi-step strategy, and executes it autonomously — including debugging when things go wrong.

Tool Use and Environment Interaction

What makes agents powerful is their ability to use tools: reading and writing files, executing shell commands, browsing documentation, running tests, and interpreting results. The agent's LLM serves as the reasoning engine, but its capabilities come from the tools it can access.

SWE-Agent's key innovation was its Agent-Computer Interface (ACI) — a carefully designed set of commands that make it easy for the LLM to interact with a development environment. Commands like search_dir, edit, open, and scroll give the agent a human-like interface to explore and modify code.

Planning and Reasoning

Modern agents use chain-of-thought reasoning to plan their approach before acting. When given a bug report, the agent first analyzes the issue, identifies relevant files, hypothesizes the root cause, plans a fix, implements it, and verifies with tests. This planning phase is what distinguishes agents from simple scripts — they can adapt their strategy when initial approaches fail.

Memory and Context Management

Long-running tasks require the agent to maintain context across many steps. Agents use various memory strategies: conversation history (keeping all previous observations in the LLM context), scratchpad notes (writing summaries to files), and retrieval-augmented generation (searching for relevant information when needed).

Architecture and Design Patterns

The ReAct Pattern

The dominant architecture for coding agents is ReAct (Reasoning + Acting). At each step, the agent generates a thought (reasoning about what to do next), an action (the specific tool call to make), and receives an observation (the result of that action). This cycle continues until the task is complete.

The Plan-and-Execute Pattern

For complex tasks, agents first generate a high-level plan, then execute each step independently. This two-phase approach separates strategic thinking from tactical execution and produces more coherent solutions for multi-file changes.

The Multi-Agent Pattern

Some systems use multiple specialized agents: a planner agent that breaks down tasks, a coder agent that writes code, a tester agent that runs tests, and a reviewer agent that evaluates quality. This mirrors human team structures and allows each agent to specialize.

The Human-in-the-Loop Pattern

Production deployments typically include human checkpoints where the agent pauses for approval before making significant changes. This balances autonomy with safety, especially for critical codebases.

Step-by-Step Implementation

Setting Up OpenHands Locally

OpenHands is the most accessible coding agent for local development. Here's how to set it up:

# Clone the OpenHands repository
git clone https://github.com/All-Hands-AI/OpenHands.git
cd OpenHands
 
# Build and run with Docker
make build
make run
 
# Or use the simpler Docker command
docker run -it --pull=always \
  -e SANDBOX_RUNTIME_CONTAINER_IMAGE=ghcr.io/all-hands-ai/runtime:latest \
  -e LOG_ALL_EVENTS=true \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v ~/.openhands:/.openhands \
  -p 3000:3000 \
  --add-host host.docker.internal:host-gateway \
  ghcr.io/all-hands-ai/openhands:latest

Building a Simple Coding Agent from Scratch

Understanding agent internals helps you use them effectively. Here's a minimal coding agent implementation:

import OpenAI from 'openai';
import { execSync } from 'child_process';
import fs from 'fs';
 
interface AgentAction {
  type: 'read_file' | 'write_file' | 'run_command' | 'search' | 'done';
  args: Record<string, string>;
}
 
interface AgentStep {
  thought: string;
  action: AgentAction;
  observation: string;
}
 
class CodingAgent {
  private openai: OpenAI;
  private steps: AgentStep[] = [];
  private maxSteps = 20;
 
  constructor(apiKey: string) {
    this.openai = new OpenAI({ apiKey });
  }
 
  async solve(task: string, workspace: string): Promise<string> {
    for (let i = 0; i < this.maxSteps; i++) {
      const response = await this.openai.chat.completions.create({
        model: 'gpt-4o',
        messages: [
          {
            role: 'system',
            content: `You are a coding agent. Solve the given task by reading files, 
writing code, and running commands. Think step by step.
 
Available actions:
- read_file: { "path": "file path" }
- write_file: { "path": "file path", "content": "file content" }
- run_command: { "command": "shell command" }
- search: { "pattern": "regex", "path": "directory" }
- done: { "result": "description of what was done" }
 
Respond with JSON: { "thought": "...", "action": { "type": "...", "args": {...} } }`
          },
          ...this.steps.flatMap(s => [
            { role: 'assistant' as const, content: `Thought: ${s.thought}\nAction: ${JSON.stringify(s.action)}` },
            { role: 'user' as const, content: `Observation: ${s.observation}` },
          ]),
          { role: 'user', content: this.steps.length === 0 ? `Task: ${task}` : 'Continue.' }
        ],
        temperature: 0.1,
      });
 
      const parsed = JSON.parse(response.choices[0].message.content || '{}');
      const observation = await this.executeAction(parsed.action, workspace);
 
      this.steps.push({ thought: parsed.thought, action: parsed.action, observation });
 
      if (parsed.action.type === 'done') {
        return parsed.action.args.result;
      }
    }
    return 'Max steps reached without completing the task';
  }
 
  private async executeAction(action: AgentAction, workspace: string): Promise<string> {
    try {
      switch (action.type) {
        case 'read_file':
          return fs.readFileSync(`${workspace}/${action.args.path}`, 'utf-8');
        case 'write_file':
          fs.writeFileSync(`${workspace}/${action.args.path}`, action.args.content);
          return `File written: ${action.args.path}`;
        case 'run_command':
          return execSync(action.args.command, { cwd: workspace, encoding: 'utf-8', timeout: 30000 });
        case 'search':
          return execSync(`grep -r "${action.args.pattern}" ${action.args.path || '.'}`, { cwd: workspace, encoding: 'utf-8' });
        default:
          return 'Unknown action type';
      }
    } catch (err) {
      return `Error: ${err instanceof Error ? err.message : String(err)}`;
    }
  }
}

Integrating with GitHub Issues

Connect your agent to GitHub Issues for automated bug fixing:

import { Octokit } from '@octokit/rest';
 
class IssueAgent {
  private agent: CodingAgent;
  private octokit: Octokit;
 
  async handleIssue(owner: string, repo: string, issueNumber: number): Promise<void> {
    const issue = await this.octokit.issues.get({ owner, repo, issue_number: issueNumber });
    
    const task = `
      Issue #${issueNumber}: ${issue.data.title}
      
      Description: ${issue.data.body}
      
      Repository: ${owner}/${repo}
      Clone the repository, fix the issue, run tests, and create a pull request.
    `;
 
    const result = await this.agent.solve(task, `/tmp/workspace/${repo}`);
    
    await this.octokit.issues.createComment({
      owner, repo, issue_number: issueNumber,
      body: `🤖 **AI Agent Attempt**\n\n${result}\n\nPlease review the changes carefully.`,
    });
  }
}

Agent workflow from issue to pull request

Real-World Use Cases

Automated Bug Fixing

The most demonstrated use case is automated bug fixing. Given a bug report with reproduction steps, agents can trace the issue through the codebase, identify the root cause, implement a fix, and verify it with existing tests. This works best for well-defined bugs with clear reproduction steps.

Test Generation and Coverage Improvement

Agents can analyze a codebase, identify under-tested areas, and generate comprehensive test suites. They can run coverage tools, find uncovered branches, and write tests that exercise those specific paths.

Documentation Generation

Given a codebase with sparse documentation, agents can explore the code, understand the architecture, and generate comprehensive documentation including README files, API references, and architecture decision records.

Code Migration Projects

Large-scale migration tasks — upgrading framework versions, converting between languages, or migrating between databases — are ideal for agents. The repetitive nature of these tasks means agents can make consistent changes across hundreds of files.

Best Practices for Production

Start with low-risk tasks — Use agents for documentation, test generation, and minor bug fixes before trusting them with critical features. Build confidence gradually.
Provide detailed task descriptions — Agents perform dramatically better with clear, specific task descriptions. Include reproduction steps for bugs, acceptance criteria for features, and constraints for migrations.
Set boundaries on file access — Configure agents to only modify specific directories or file types. Prevent them from touching infrastructure code, deployment configs, or security-sensitive files.
Require test passage before merging — All agent-generated changes must pass the full test suite before human review. This catches hallucinated code and incorrect assumptions.
Monitor agent costs — Agent runs can consume significant API tokens. Set spending limits and monitor costs per task to ensure the ROI justifies the expense.
Version control everything — Commit agent changes frequently and atomically. If an agent goes off track, you need clean rollback points.
Keep humans in the loop for architecture — Agents can implement within existing patterns but struggle with novel architectural decisions. Use them for implementation, not design.
Log and review agent reasoning — Record the agent's chain of thought for debugging and improvement. Understanding why an agent made certain decisions helps you write better task descriptions.

Common Pitfalls and Solutions

Pitfall	Impact	Solution
Agent enters infinite loops	Wasted API costs, no progress	Set max steps and timeout limits
Hallucinated file paths	Failed implementations	Provide explicit file listings in task context
Breaking existing tests	Regressions in production	Require full test suite passage before accepting changes
Over-confident wrong solutions	Subtle bugs introduced	Always require human review for agent PRs
Excessive API token usage	High costs	Use cheaper models for exploration, expensive ones for implementation
Ignoring project conventions	Inconsistent code style	Include style guide and examples in agent context
Agent modifies unrelated files	Unexpected side effects	Restrict file access to relevant directories

Debugging Agent Failures

When an agent fails to solve a task, the debugging process differs from debugging human code. The issue is almost always in the agent's reasoning, not its execution. Review the chain of thought to understand where the agent's mental model diverged from reality.

Performance Optimization

Agent performance depends on three factors: latency (time per step), accuracy (percentage of steps that advance the solution), and cost (API tokens consumed). Optimize latency by using faster models for simple actions and reserving powerful models for complex reasoning. Improve accuracy by providing better task descriptions and project context. Reduce cost by caching common operations and setting appropriate context window sizes.

The most impactful optimization is context pruning — keeping only the most relevant information in the agent's context window. Include the task description, relevant file contents, and recent action history, but exclude irrelevant files and long-past actions.

Comparison with Alternatives

Tool	Open Source	Cost	Performance (SWE-bench)	Customizability	Best For
Devin	No	$500/mo	13.86%	Low	Teams wanting managed service
SWE-Agent	Yes	API costs	12.5%	High	Research and custom pipelines
OpenHands	Yes	API costs	12%+	High	Self-hosted, customizable
GitHub Copilot Workspace	No	Included	N/A (preview)	Low	GitHub-native workflows
Cursor (Agent Mode)	No	$20/mo	N/A	Medium	Individual developers

Advanced Patterns

Multi-Agent Collaboration

For complex tasks, use multiple agents with different specializations. A planner agent decomposes the task, a code agent implements changes, a test agent verifies correctness, and a review agent evaluates quality. This mirrors how human teams work and produces better results than a single agent handling everything.

Self-Improving Agents

Implement feedback loops where agents learn from their failures. When an agent's solution fails tests, it should analyze the failure, update its understanding, and retry with a modified approach. This iterative refinement dramatically improves success rates.

Agent-Augmented Code Review

Use agents not just for writing code but for reviewing it. An agent can check out a PR, run the test suite, analyze the changes for issues, and provide detailed feedback — essentially acting as an automated senior developer reviewer.

Future Outlook

The coding agent space is evolving at breakneck speed. By the end of 2025, we can expect agents to handle 30-50% of routine development tasks autonomously. The key areas of improvement will be in long-horizon planning (handling tasks that require dozens of coordinated steps), reliability (reducing hallucinations and failures), and integration (seamlessly fitting into existing development workflows).

The most transformative development will be the emergence of specialized agents — agents that deeply understand specific frameworks, domains, or codebases. A specialized agent trained on your company's codebase, coding standards, and architecture will dramatically outperform general-purpose agents.

For developers, the skill shift is clear: the ability to effectively specify tasks for agents, evaluate their output, and integrate their work into production systems will become as valuable as traditional coding skills. The developers who learn to work with agents now will have a significant advantage as the technology matures.

Conclusion

AI coding agents represent a fundamental shift from AI as a tool to AI as a teammate. While they're not yet capable of replacing senior developers, they're already effective at handling well-defined, bounded tasks that would otherwise consume developer time.

Key takeaways:

Coding agents follow an Observe-Think-Act loop, using tools to interact with development environments autonomously
They excel at bounded tasks: bug fixes with clear reproduction steps, test generation, documentation, and code migrations
OpenHands provides the most accessible open-source starting point for experimenting with coding agents
Always require human review for agent-generated changes, especially in production codebases
Provide detailed, specific task descriptions — agent performance correlates directly with task clarity
Start with low-risk tasks and build confidence before expanding agent responsibilities
The ability to effectively direct and evaluate agents is becoming a core developer skill

Start by setting up OpenHands locally and giving it a simple, well-defined task from your project's issue tracker. Observe how it approaches the problem, evaluate the quality of its solution, and iterate on your task descriptions. This hands-on experience will teach you more about agent capabilities and limitations than any guide.

Minh Vo

Slaying code & making it lit fr fr 🔥 tagline