How to Build Your First AI Agent with Claude API: Step-by-Step Guide

Most developers build their first AI agent the hard way: they wire up a chat loop, bolt on some tools, and watch it spiral into an uncontrollable mess of nested callbacks and race conditions. Building an agentic AI that actually works in production requires a different mental model from day one. This guide walks you through exactly that, using the Claude API, plain Python, and (optionally) LangChain to build something you can deploy and extend.

By the end, you will have a working AI agent that can use tools, reason over results, and loop until it reaches a goal. No fluff. Just code.


What Makes an AI Agent Different from a Chatbot

A chatbot responds to one message at a time. An AI agent takes a goal, breaks it into steps, and executes those steps autonomously, calling tools, evaluating results, and deciding what to do next.

The core loop looks like this:

  1. User provides a goal
  2. The model decides what action to take (or whether to respond directly)
  3. If an action is needed, a tool is called
  4. The tool result is returned to the model
  5. The model evaluates the result and decides the next step
  6. Repeat until the goal is achieved

This is what “agentic AI” means in practice. The model is not just generating text — it is reasoning about state, making decisions, and driving a process forward.

💡 Key Insight
The difference between a chatbot and an agent is the loop. Agents operate in a reasoning-action-observation cycle. Claude's tool_use API was designed specifically to power this pattern natively.

Why Claude API for Building AI Agents

Claude is not just another model with a chat interface. Anthropic built the Claude API with agentic workflows as a first-class use case. A few reasons it stands out for this kind of work:

Native tool use: Claude’s tool_use content blocks are clean and structured. The model returns a tool name and a JSON arguments object. You execute the tool, return the result, and the model continues. No hacks, no prompt engineering workarounds.

Long context: Claude 3.7 Sonnet supports 200K tokens of context. For agents that need to reason over long document trails, API responses, or multi-step histories, this matters enormously.

Instruction following: Claude tends to respect system prompt constraints reliably. When you tell it to only respond in JSON, or to always call a specific tool before answering, it follows through — which is critical for deterministic agent behavior.

If you want a detailed cost and performance breakdown before you commit, our Claude API vs OpenAI API comparison covers token pricing and benchmark differences side by side.

Pros

  • Native tool_use blocks are clean and structured
  • 200K context window handles long agent histories
  • Strong instruction following reduces prompt engineering overhead
  • Anthropic SDK is well-documented and actively maintained
  • Models available at multiple cost tiers (Haiku, Sonnet, Opus)

Cons

  • No free tier — you pay from the first token
  • Rate limits can bite you during rapid prototyping
  • Slightly higher cost than some OpenAI equivalents at high volume

Prerequisites and Environment Setup

Before writing a single line of agent code, make sure you have the following:

pip install anthropic

Set your API key as an environment variable so it never ends up in your source code:

export ANTHROPIC_API_KEY="sk-ant-..."

That is all you need. You do not need LangChain, LlamaIndex, or any other framework to build a working agent. We will cover LangChain later in this guide for cases where it genuinely adds value.


Building Your First AI Agent: The Core Loop

Here is a minimal but complete agentic AI loop using the Claude API. It includes tool definition, tool execution, and the reasoning cycle.

Step 1: Define Your Tools

Tools are functions the model can choose to call. You define them as JSON schemas that describe the name, purpose, and parameters of each function.

import anthropic
import json

client = anthropic.Anthropic()

# Define the tools available to the agent
tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a given city. Returns temperature in Celsius and conditions.",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "The city name, e.g. 'San Francisco'"
                }
            },
            "required": ["city"]
        }
    },
    {
        "name": "calculate",
        "description": "Perform a mathematical calculation. Input a valid Python expression as a string.",
        "input_schema": {
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": "A valid Python math expression, e.g. '42 * 1.15'"
                }
            },
            "required": ["expression"]
        }
    }
]
⚠️ Tool Description Quality Matters More Than You Think
Claude decides which tool to call based almost entirely on the tool description. A vague description like "gets data" will result in the model guessing wrong. Be specific: describe what the tool returns, not just what it does.

Step 2: Implement the Tool Functions

These are the actual Python functions that execute when Claude calls a tool:

def get_weather(city: str) -> str:
    # In a real agent, you would call a weather API here
    # For this tutorial, we return mock data
    mock_data = {
        "San Francisco": {"temp": 16, "conditions": "Partly cloudy"},
        "New York": {"temp": 22, "conditions": "Sunny"},
        "London": {"temp": 12, "conditions": "Overcast"},
    }
    result = mock_data.get(city, {"temp": 20, "conditions": "Unknown"})
    return f"Temperature: {result['temp']}°C, Conditions: {result['conditions']}"

def calculate(expression: str) -> str:
    try:
        result = eval(expression, {"__builtins__": {}}, {})
        return str(result)
    except Exception as e:
        return f"Error: {str(e)}"

def execute_tool(tool_name: str, tool_input: dict) -> str:
    if tool_name == "get_weather":
        return get_weather(**tool_input)
    elif tool_name == "calculate":
        return calculate(**tool_input)
    else:
        return f"Unknown tool: {tool_name}"

Step 3: Write the Agent Loop

This is the heart of the system. The loop runs until Claude returns a final text response with no tool calls:

def run_agent(user_message: str, max_iterations: int = 10) -> str:
    messages = [{"role": "user", "content": user_message}]
    
    for iteration in range(max_iterations):
        response = client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=4096,
            tools=tools,
            messages=messages
        )
        
        # Check if the model wants to use a tool
        if response.stop_reason == "tool_use":
            # Add the assistant's response (with tool calls) to history
            messages.append({"role": "assistant", "content": response.content})
            
            # Process each tool call and collect results
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    result = execute_tool(block.name, block.input)
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result
                    })
            
            # Return tool results to the model
            messages.append({"role": "user", "content": tool_results})
        
        elif response.stop_reason == "end_turn":
            # The model has finished reasoning and returned a final answer
            for block in response.content:
                if hasattr(block, "text"):
                    return block.text
    
    return "Agent reached maximum iterations without a final answer."

# Run it
if __name__ == "__main__":
    result = run_agent(
        "What's the weather in San Francisco, and if it's under 20°C, "
        "how many degrees warmer would 25°C be compared to current temp?"
    )
    print(result)

Run this and Claude will call get_weather, receive the result (16°C), then call calculate with 25 - 16, and return a natural language answer. That is a complete agentic AI loop.


Where LangChain Fits In

LangChain is a popular framework for building agents and chains. It abstracts away the message management, tool routing, and loop logic you wrote above. For simple agents, this abstraction adds more overhead than value. For complex multi-step pipelines, it becomes genuinely useful.

Here is when to reach for LangChain:

  • You are building agents with more than 5 tools and need structured routing logic
  • You need RAG (retrieval-augmented generation) integrated into the agent loop
  • You are chaining multiple models or agents together in a pipeline
  • You want built-in observability (LangSmith) without wiring it yourself

For multi-agent systems specifically, our guide to building multi-agent systems with LangGraph goes deep on how to wire multiple Claude agents together with shared state and conditional routing.

Here is the same agent logic using LangChain with Claude:

from langchain_anthropic import ChatAnthropic
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.tools import tool

@tool
def get_weather(city: str) -> str:
    """Get the current weather for a given city. Returns temperature and conditions."""
    mock_data = {
        "San Francisco": "16°C, Partly cloudy",
        "New York": "22°C, Sunny",
    }
    return mock_data.get(city, "20°C, Unknown")

@tool  
def calculate(expression: str) -> str:
    """Perform a mathematical calculation. Input a valid Python math expression."""
    try:
        return str(eval(expression, {"__builtins__": {}}, {}))
    except Exception as e:
        return f"Error: {e}"

llm = ChatAnthropic(model="claude-sonnet-4-5")
lc_tools = [get_weather, calculate]

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Use tools when needed."),
    ("human", "{input}"),
    ("placeholder", "{agent_scratchpad}"),
])

agent = create_tool_calling_agent(llm, lc_tools, prompt)
executor = AgentExecutor(agent=agent, tools=lc_tools, verbose=True)

result = executor.invoke({"input": "Weather in San Francisco and how far from 25°C?"})
print(result["output"])

The LangChain version is more concise for setup but less transparent. When something breaks, the native version is far easier to debug. Start native, then graduate to LangChain when complexity justifies it.

Approach Best For Debug Ease Overhead
Native Anthropic SDK Simple to medium agents High Low
LangChain + Claude Complex pipelines, RAG Medium Medium
LangGraph Multi-agent systems Low High

Common Pitfalls That Will Kill Your Agent in Production

Building a demo agent is easy. Building one that survives real users is harder. Watch out for these:

1. No exit condition on the loop. If the model gets confused and keeps calling tools without converging, you need max_iterations to prevent runaway API costs. We set this to 10 above. Tune it for your use case.

2. Tools that fail silently. If a tool raises an exception and you catch it with a generic “Error occurred” message, the model will keep retrying with different inputs. Return specific, actionable error messages.

3. Missing context in tool results. When you return a tool result, include the unit, format, and any caveats. “16” means nothing. “16°C as of 09:00 PST” is actionable.

4. Overly broad system prompts. Agentic AI models need clear boundaries. Tell Claude exactly when to use tools and when to respond directly. Vague instructions lead to unpredictable behavior.

For deeper guidance on how to write system prompts that actually constrain model behavior, the prompt engineering guide covers techniques that hold up in production.


What to Build Next

Once your first agent loop is working, here is a natural progression:

  • Add memory: Store conversation history in a database (SQLite or Redis) so the agent remembers past interactions across sessions.
  • Add a real tool: Replace the mock weather function with a real API call (OpenWeatherMap, for example).
  • Add streaming: Use client.messages.stream() for real-time output in production UIs.
  • Add observability: Wire in LangSmith or a simple logging wrapper so you can see exactly what the model is calling and why.

For your coding environment, Cursor is the strongest AI-assisted IDE for this kind of development work. Its 200K context window and codebase-aware completions make wiring up agent loops significantly faster. We compared it against the alternatives in our best AI coding assistants roundup.

If you want to go further with automation pipelines that extend beyond agents, our breakdown of n8n vs Zapier vs Make vs Python is worth reading before you decide how to orchestrate production workflows around your agent.


How to Test Your Agent Before Shipping

Testing agents is different from testing regular software. The model’s behavior is non-deterministic, so unit tests on exact outputs are a trap. Instead:

  • Test the tool layer independently. Every tool function should have unit tests that verify correct output for known inputs. The model is not part of this test.
  • Log every API call. Store the full messages array for each agent run. When something goes wrong, you need to see exactly what the model saw.
  • Create fixed eval scenarios. Define 10-20 representative user goals and check whether the agent achieves them consistently across model versions. Track pass rates, not individual outputs.
  • Set a budget limit during testing. Use max_tokens and max_iterations conservatively while developing. It is easy to burn $20 of API credits debugging a loop that never terminates.
💡 Eval Over Unit Tests
For agentic AI, behavioral evaluations (did it achieve the goal?) matter more than output-level unit tests. Build a small eval harness early — it will save you hours of manual spot-checking as you iterate.

Conclusion: Ship Something Small, Then Scale It

The biggest mistake developers make when they first build AI agents is over-engineering before they have validated anything. Start with the native Claude API loop. Add two or three real tools that solve a specific problem. Ship it. Then look at where it breaks and add complexity only where the data tells you to.

The agentic AI pattern is powerful precisely because it is composable. Each tool you add expands what the agent can do without touching the core loop. Master the loop first, and everything else follows.

Ready to go deeper? Check out our guide on building multi-agent systems with LangGraph for the next step: coordinating multiple Claude agents with shared memory and conditional routing.

Bottom Line

The Claude API's native tool_use is the cleanest primitive for building agentic AI in 2026: start there, keep the loop simple, and only add LangChain when your pipeline complexity earns it.

Disclosure: This article contains affiliate links. I earn a commission when you sign up for tools through my links, at no extra cost to you.