How to Build Your First AI Agent with Claude API: A Step-by-Step Guide

Most developers who “build AI agents” are really just calling a chat completion endpoint in a loop. That is not an agent. A real agentic AI system perceives its environment, decides which tools to use, acts on those decisions, and observes the results before deciding what to do next. The Claude API makes this loop surprisingly straightforward to implement. This guide walks you through building one from scratch, in Python, with working code you can run today.

What “Agentic AI” Actually Means (and Why It Matters)

Before writing a single line of code, it helps to be precise about what agentic behavior actually is. An agent is not a chatbot. The difference comes down to autonomy over tools.

A chatbot takes your message and returns a response. An agent takes a goal, reasons about which tools it needs to achieve that goal, calls those tools, observes the results, and decides whether to keep going or stop. The agent controls its own execution path. You hand it a task; it figures out the steps.

The Claude API implements this through a feature called tool use (also called function calling in other APIs). You define a set of tools in your API request. Claude decides whether it needs to call one, which one, and with what arguments. Your code executes the tool and sends the result back. Claude processes the result and either calls another tool or returns a final answer. That loop is the core of every production agentic AI system.

💡 Key Concept
An AI agent is defined by its control flow: perceive, reason, act, observe, repeat. The Claude API's tool_use content block is what makes this loop possible.

If you want to understand how Claude’s API compares to OpenAI’s for this kind of agentic work, the Claude API vs OpenAI API cost and performance breakdown covers the tradeoffs in detail, including context window limits that become critical when agents accumulate long conversation histories.

Step 1: Setting Up the Claude API

You will need an Anthropic account and an API key. Sign up at anthropic.com and grab your key from the console.

Install the SDK:

pip install anthropic

Then verify your setup with a basic call:

import anthropic

client = anthropic.Anthropic(api_key="your-api-key-here")

message = client.messages.create(
    model="claude-opus-4-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello, Claude."}]
)

print(message.content[0].text)

If that prints a response, you are ready to build agents. Store your API key in an environment variable (ANTHROPIC_API_KEY) rather than hardcoding it. The SDK picks it up automatically.

Step 2: Define Your Tools

Tools are the hands of your agent. Without them, Claude can only reason and respond with text. With them, it can search the web, run code, read files, call APIs, or do anything your Python code can do.

Here is a minimal example: an agent that can look up current weather data. We will define a fake weather tool for illustration, then swap in a real API later.

tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a given city. Returns temperature in Celsius and conditions.",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "The name of the city, e.g. 'London' or 'Tokyo'"
                }
            },
            "required": ["city"]
        }
    }
]

The input_schema follows JSON Schema. Claude uses this to understand exactly what arguments the tool accepts, so write clear descriptions. Vague descriptions produce hallucinated arguments.

Step 3: Implement the Agent Loop

This is the core of your agentic AI system. The loop runs until Claude either returns a final text response or you hit a maximum iteration limit.

import anthropic
import json
import os

client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

def get_weather(city: str) -> dict:
    # Replace with a real weather API call in production
    return {"city": city, "temperature_celsius": 18, "conditions": "Partly cloudy"}

def run_agent(user_query: str, max_iterations: int = 10) -> str:
    messages = [{"role": "user", "content": user_query}]
    
    for iteration in range(max_iterations):
        response = client.messages.create(
            model="claude-opus-4-5",
            max_tokens=4096,
            tools=tools,
            messages=messages
        )
        
        # Check if Claude wants to use a tool
        if response.stop_reason == "tool_use":
            # Add Claude's response to the conversation
            messages.append({"role": "assistant", "content": response.content})
            
            # Process each tool call
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    tool_name = block.name
                    tool_input = block.input
                    
                    # Dispatch to the right function
                    if tool_name == "get_weather":
                        result = get_weather(**tool_input)
                    else:
                        result = {"error": f"Unknown tool: {tool_name}"}
                    
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": json.dumps(result)
                    })
            
            # Send tool results back to Claude
            messages.append({"role": "user", "content": tool_results})
        
        elif response.stop_reason == "end_turn":
            # Claude is done, extract the final text response
            for block in response.content:
                if hasattr(block, "text"):
                    return block.text
        
        else:
            # Unexpected stop reason
            return f"Agent stopped unexpectedly: {response.stop_reason}"
    
    return "Agent reached maximum iterations without completing the task."

# Run it
result = run_agent("What's the weather like in Tokyo right now?")
print(result)

This is a complete, working agent loop in about 60 lines. Claude decides when to call get_weather, calls it with the right arguments, receives the result, and uses that information to answer your question. You never explicitly told it to use the tool. It figured that out from your query and the tool descriptions.

⚠️ Always Set max_iterations
Without an iteration cap, a confused agent can loop indefinitely and rack up API costs. Ten iterations is generous for most tasks. Complex research agents might need 20-30, but start conservative.

Step 4: Add More Tools and Real Capabilities

A single-tool agent is a proof of concept. Real agentic systems need multiple tools working together. Here is how to extend the setup with a web search tool and a code execution tool:

tools = [
    {
        "name": "web_search",
        "description": "Search the web for current information on any topic. Use when you need recent data not in your training.",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "The search query"}
            },
            "required": ["query"]
        }
    },
    {
        "name": "run_python",
        "description": "Execute a Python code snippet and return the output. Use for calculations, data processing, or logic that requires computation.",
        "input_schema": {
            "type": "object",
            "properties": {
                "code": {"type": "string", "description": "Valid Python code to execute"}
            },
            "required": ["code"]
        }
    },
    {
        "name": "get_weather",
        "description": "Get current weather for a city.",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string"}
            },
            "required": ["city"]
        }
    }
]

For run_python, use a sandboxed execution environment in production. Running arbitrary LLM-generated code in your main process is a security risk. Tools like Replit offer isolated execution environments that pair well with this pattern.

Your dispatch function scales cleanly with a dictionary:

TOOL_HANDLERS = {
    "web_search": handle_web_search,
    "run_python": handle_run_python,
    "get_weather": get_weather,
}

# In the loop:
if tool_name in TOOL_HANDLERS:
    result = TOOL_HANDLERS[tool_name](**tool_input)
else:
    result = {"error": f"Tool '{tool_name}' not registered"}

Step 5: Where LangChain Fits In

LangChain is a popular framework for building agentic AI applications, and it is worth understanding what it actually adds before reaching for it.

LangChain provides:

Memory abstractions: Automatically summarize or compress conversation history as it grows
Chain composition: Connect multiple LLM calls and tool invocations in a declared pipeline
Retrieval-augmented generation (RAG): Built-in connectors for vector databases like Pinecone and Chroma
Agent executors: Pre-built agent loops for common patterns (ReAct, plan-and-execute, etc.)

Here is the same weather agent in LangChain using the Anthropic integration:

from langchain_anthropic import ChatAnthropic
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.prompts import ChatPromptTemplate
from langchain.tools import tool

@tool
def get_weather(city: str) -> str:
    """Get the current weather for a given city."""
    return f"Weather in {city}: 18°C, partly cloudy"

llm = ChatAnthropic(model="claude-opus-4-5")

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant with access to tools."),
    ("human", "{input}"),
    ("placeholder", "{agent_scratchpad}"),
])

agent = create_tool_calling_agent(llm, [get_weather], prompt)
executor = AgentExecutor(agent=agent, tools=[get_weather], verbose=True)

result = executor.invoke({"input": "What is the weather in Tokyo?"})
print(result["output"])

The LangChain version is more declarative. The tradeoff is real: you gain abstraction and pre-built patterns, but you lose visibility into what is actually happening in the loop. When something breaks, LangChain’s abstraction layers make debugging harder.

LangChain Pros

Pre-built memory, RAG, and chain patterns save days of implementation work
Large ecosystem of integrations (vector DBs, data loaders, tools)
Active community and extensive documentation
Agent executors handle edge cases you might miss building from scratch

LangChain Cons

Abstractions obscure what API calls are actually being made
Debugging failures requires digging through multiple abstraction layers
Frequent breaking changes between versions
For simple agents, it is significant added complexity with little benefit

The practical rule: start without LangChain. If you find yourself rebuilding memory management or RAG from scratch for the third time, reach for it. For most first agents, the raw SDK loop is cleaner and easier to debug.

For a deeper look at RAG specifically, the RAG vs Fine-Tuning comparison breaks down exactly when retrieval-augmented generation is worth the added complexity versus just fine-tuning your model.

Step 6: Production Patterns You Actually Need

Getting an agent working in a notebook is one thing. Running it reliably in production is another. These patterns separate hobby projects from shipped products.

Structured system prompts: Tell Claude its role, its constraints, and how to behave when it is uncertain. Vague system prompts produce unpredictable behavior in edge cases.

system_prompt = """You are a research assistant with access to web search and Python execution tools.

Rules:
- Always verify claims with web search before stating them as facts
- When calculations are needed, use the run_python tool rather than doing mental math
- If a task requires more than 5 tool calls, stop and ask the user to clarify the goal
- Never execute code that modifies files or makes network requests unless explicitly asked"""

Conversation history management: Context windows are finite. For claude-opus-4-5, you have 200K tokens, which sounds enormous until you have an agent that has made 50 tool calls with large JSON results. Implement a sliding window or summarization strategy before you need it.

Logging every tool call: You need to know what your agent did and why. Log the tool name, inputs, outputs, and timestamp for every call. This is your debugging lifeline when an agent behaves unexpectedly in production.

import logging

logger = logging.getLogger("agent")

def logged_tool_call(tool_name: str, tool_input: dict, handler_fn) -> dict:
    logger.info(f"Tool call: {tool_name} | Input: {tool_input}")
    result = handler_fn(**tool_input)
    logger.info(f"Tool result: {tool_name} | Output: {result}")
    return result

Retry with backoff: API calls fail. Network timeouts happen. Wrap your client.messages.create() call in exponential backoff for transient errors. The tenacity library handles this cleanly in Python.

For teams building agents that review and modify code, the multi-agent PR review guide shows how these patterns scale to real engineering workflows with multiple specialized agents working together.

Step 7: Testing Your Agent

Testing agents is harder than testing regular functions because the output is probabilistic. These three testing strategies cover the most important ground:

Deterministic tool testing: Test each tool function in isolation with fixed inputs and expected outputs. Your tools should be pure functions where possible. This is standard unit testing.

Golden-path integration tests: Run your full agent loop with a fixed prompt and assert that it calls the right tools in roughly the right order. Use claude-haiku-3-5 for tests to keep costs low.

Adversarial prompts: Feed your agent edge cases. What happens when it cannot find information? When a tool returns an error? When the user asks it to do something outside its defined scope? The answers reveal gaps in your system prompt and error handling.

For more advanced Claude Code workflows, 8 Advanced Claude Code Tips covers context management and cost control techniques that apply equally to agentic systems.

Choosing the Right Claude Model for Agents

Not every agent task requires Claude’s most powerful model. Choosing the right model dramatically affects your cost and latency.

Model	Best For	Context	Cost
claude-opus-4-5	Complex multi-step reasoning, ambiguous tasks	200K	Highest
claude-sonnet-4-5	Most production agents, good balance	200K	Moderate
claude-haiku-3-5	Simple tool dispatch, high-volume, testing	200K	Lowest

For most agentic workloads, Sonnet is the right default. Use Opus when the task requires genuine reasoning under uncertainty. Use Haiku for classification, simple extraction, and routing steps within a larger pipeline.

💡 Cost Tip
Cache your system prompt using Anthropic's prompt caching feature. System prompts are sent with every API call in a long agent run. Caching them cuts input token costs by up to 90% on repeated calls.

What to Build Next

Once your first agent is working, the interesting patterns emerge from combining agents. A research agent that spawns specialized sub-agents for different domains. A coding agent that calls a testing agent to verify its output. An orchestrator that routes tasks to domain-specific workers.

The architecture principles stay the same: define clear tools, write precise system prompts, log everything, cap your iterations, and handle errors gracefully. The complexity is in the problem domain, not the agentic pattern itself.

Bottom Line

The Claude API's tool use feature gives you everything you need to build production-grade agentic AI in under 100 lines of Python. Start without LangChain, log every tool call, and add complexity only when the problem demands it.

Start Building

The best way to understand agentic AI is to run an agent. Copy the loop from Step 3, swap in a tool that does something you actually care about (a GitHub API call, a database query, a file reader), and watch Claude decide when and how to use it.

From there, the Claude API vs OpenAI API 2026 developer guide is a useful next read for understanding the full feature set available to you as your agents grow more complex.

Get started at anthropic.com. Your first agent is closer than you think.

How to Build Your First AI Agent with Claude API: A Step-by-Step Guide#

What “Agentic AI” Actually Means (and Why It Matters)#

Step 1: Setting Up the Claude API#

Step 2: Define Your Tools#

Step 3: Implement the Agent Loop#

Step 4: Add More Tools and Real Capabilities#

Step 5: Where LangChain Fits In#

LangChain Pros

LangChain Cons

Step 6: Production Patterns You Actually Need#

Step 7: Testing Your Agent#

Choosing the Right Claude Model for Agents#

What to Build Next#

Start Building#

Get the AI tools that actually work

Related Articles

Build Your First AI Agent with Claude API

How to Build a Multi-Agent System with LangGraph

Claude API vs OpenAI API: Cost and Performance Breakdown