How to Build a Multi-Agent System with LangGraph

Most LLM applications start simple: one prompt, one response, ship it. Then requirements grow. The task needs to search the web, then read the results, then decide whether to search again, then synthesize everything. You add more logic. Then you need one agent to write a plan and another to execute it. Suddenly you are managing state, routing decisions, and failure modes across multiple LLM calls, and a simple chain is not the right abstraction anymore.

LangGraph is the right abstraction for this class of problem. It gives you an explicit graph structure for your agent workflows, persistent state management, and clean patterns for handling the complexity that emerges when multiple agents collaborate. This tutorial shows you how to actually build with it, not just understand it in theory.

What LangGraph Is

LangGraph is a library built on top of LangChain that lets you model agent workflows as directed graphs (and directed acyclic graphs, or graphs with cycles where needed). Each node in the graph is a function that reads from state and writes to state. Edges define how control flows between nodes, and conditional edges let you route based on the state at runtime.

The key insight is that stateful, multi-step agent workflows are naturally graphs. The “think, act, observe, repeat” loop of a ReAct agent is a cycle in a graph. The “planner delegates to specialist” pattern of a supervisor-worker system is a graph with routing edges. By making the graph structure explicit, LangGraph makes these workflows easier to debug, test, and modify.

What LangGraph is not: it is not magic. The LLMs still make the decisions. The graph gives you control over the flow and state, but if your prompts are bad or your model is making wrong decisions, the graph structure does not fix that.

When to Use LangGraph vs. Simple Chains

Not every LLM application needs LangGraph. Here is a practical decision framework.

Use a simple chain (or direct API calls) when:

The workflow is linear: input, one or two transformations, output
There is no branching or looping
State does not need to persist between steps
The task is stateless (classification, extraction, summarization)

Use LangGraph when:

The workflow has branching logic based on LLM decisions
You need loops (“keep searching until you find what you need”)
Multiple agents need to share and update shared state
You need human-in-the-loop checkpoints
You want to pause, resume, or replay a workflow
The workflow is long enough that you need explicit error recovery

A good heuristic: if you find yourself passing dictionaries between functions and adding if-else blocks for routing logic, you are building LangGraph manually. At that point, use LangGraph.

Setting Up

pip install langgraph langchain-anthropic langchain-community

You will also need an API key for whichever LLM you are using. LangGraph is model-agnostic, but Claude is an excellent choice for agentic workflows.

import os
os.environ["ANTHROPIC_API_KEY"] = "your-key-here"

Building a Simple Research Agent

Let us build a research agent that takes a question, searches the web, reads relevant results, and produces a synthesized answer. This is a realistic use case that demonstrates LangGraph’s core concepts.

Define the State

State is the shared data structure that flows through your graph. Every node reads from it and can write to it.

from typing import TypedDict, Annotated, List
from langgraph.graph.message import add_messages

class ResearchState(TypedDict):
    question: str
    search_queries: List[str]
    search_results: List[dict]
    synthesis: str
    iteration_count: int
    should_search_more: bool

TypedDict gives you type safety. The add_messages annotation (from LangGraph) handles message accumulation automatically if you include a messages field.

Define the Nodes

Each node is a Python function that takes state and returns a partial state update.

from langchain_anthropic import ChatAnthropic
from langchain_core.messages import HumanMessage, SystemMessage

llm = ChatAnthropic(model="claude-3-7-sonnet-20250219")

def generate_search_queries(state: ResearchState) -> dict:
    """Generate search queries based on the question."""
    prompt = f"""Generate 2-3 specific search queries to answer this question:

Question: {state['question']}

Return only the search queries, one per line."""

    response = llm.invoke([HumanMessage(content=prompt)])
    queries = [q.strip() for q in response.content.split('\n') if q.strip()]

    return {
        "search_queries": queries,
        "iteration_count": state.get("iteration_count", 0) + 1
    }


def execute_searches(state: ResearchState) -> dict:
    """Execute the search queries and collect results."""
    # In a real implementation, use a search tool like Tavily or SerpAPI
    # Here we simulate with placeholder results
    results = []
    for query in state["search_queries"]:
        # search_tool.invoke({"query": query}) in production
        results.append({
            "query": query,
            "content": f"[Search results for: {query}]",
            "source": "https://example.com"
        })

    return {"search_results": results}


def evaluate_results(state: ResearchState) -> dict:
    """Decide whether we have enough information to synthesize."""
    results_text = "\n".join([r["content"] for r in state["search_results"]])

    prompt = f"""Given this question and search results, do we have enough information for a comprehensive answer?

Question: {state['question']}
Search Results:
{results_text}

Answer with only YES or NO."""

    response = llm.invoke([HumanMessage(content=prompt)])
    have_enough = "YES" in response.content.upper()

    return {"should_search_more": not have_enough}


def synthesize_answer(state: ResearchState) -> dict:
    """Synthesize a final answer from all search results."""
    results_text = "\n\n".join([
        f"Source: {r['source']}\n{r['content']}"
        for r in state["search_results"]
    ])

    prompt = f"""Based on the following research, provide a comprehensive answer to the question.

Question: {state['question']}

Research:
{results_text}

Provide a clear, well-organized answer with citations where relevant."""

    response = llm.invoke([HumanMessage(content=prompt)])
    return {"synthesis": response.content}

Build the Graph

from langgraph.graph import StateGraph, END

def route_after_evaluation(state: ResearchState) -> str:
    """Conditional edge: route to more searching or to synthesis."""
    if state["should_search_more"] and state["iteration_count"] < 3:
        return "search_more"
    return "synthesize"


# Build the graph
workflow = StateGraph(ResearchState)

# Add nodes
workflow.add_node("generate_queries", generate_search_queries)
workflow.add_node("execute_searches", execute_searches)
workflow.add_node("evaluate_results", evaluate_results)
workflow.add_node("synthesize", synthesize_answer)

# Add edges
workflow.set_entry_point("generate_queries")
workflow.add_edge("generate_queries", "execute_searches")
workflow.add_edge("execute_searches", "evaluate_results")

# Conditional routing after evaluation
workflow.add_conditional_edges(
    "evaluate_results",
    route_after_evaluation,
    {
        "search_more": "generate_queries",  # loop back
        "synthesize": "synthesize"
    }
)

workflow.add_edge("synthesize", END)

# Compile
app = workflow.compile()

Run It

initial_state = {
    "question": "What are the key differences between LangGraph and AutoGen for building multi-agent systems?",
    "search_queries": [],
    "search_results": [],
    "synthesis": "",
    "iteration_count": 0,
    "should_search_more": False
}

result = app.invoke(initial_state)
print(result["synthesis"])

Managing State Properly

State management is where most LangGraph applications go wrong. Common mistakes:

Mutating state in place. Nodes should return new dictionaries representing state updates, not mutate the incoming state object. LangGraph merges your returned dict into the existing state.

Missing fields in TypedDict. If your TypedDict requires a field but you do not initialize it, you get confusing errors. Always initialize all fields in your starting state, even if with empty values.

State becoming too large. If you accumulate large amounts of data (like search results across many iterations), state can bloat. Build cleanup steps that summarize or prune state as the workflow progresses.

Shared state in parallel branches. If your graph has parallel branches that both write to the same state field, you need to define reducers for those fields. LangGraph supports this through annotated types.

Handling Failures

Failures in multi-agent systems are inevitable. LLMs time out. API calls fail. The model returns unexpected output that breaks your parsing logic. Build for failure from the start.

Retry Logic at the Node Level

Wrap LLM calls in retry logic using tenacity or a simple loop:

from tenacity import retry, stop_after_attempt, wait_exponential

@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10))
def call_llm_with_retry(messages):
    return llm.invoke(messages)

Graceful Degradation

Add an error field to your state and handle it explicitly:

class ResearchState(TypedDict):
    # ... existing fields ...
    error: str | None

def safe_synthesize(state: ResearchState) -> dict:
    try:
        # synthesis logic
        return {"synthesis": result, "error": None}
    except Exception as e:
        return {"synthesis": "Unable to complete research due to an error.", "error": str(e)}

Iteration Limits

Always add iteration limits to loops. The iteration_count < 3 check in the routing function above is not optional. Without it, a misbehaving routing function can cause infinite loops and rack up significant API costs.

Checkpointing and Human-in-the-Loop

One of LangGraph’s most valuable features is checkpointing: the ability to persist graph state to a database and resume from any point.

from langgraph.checkpoint.sqlite import SqliteSaver

# Persist state to SQLite
checkpointer = SqliteSaver.from_conn_string("research_agent.db")
app = workflow.compile(checkpointer=checkpointer)

# Run with a thread ID for resumability
config = {"configurable": {"thread_id": "research-session-001"}}
result = app.invoke(initial_state, config=config)

# Later, resume from the same thread
result = app.invoke(None, config=config)  # None resumes from checkpoint

Human-in-the-loop is a natural extension: add an interrupt_before or interrupt_after to pause the graph at a specific node and wait for human input before continuing. This is essential for workflows where automated actions have real-world consequences.

Supervisor-Worker Pattern

For more complex multi-agent systems, the supervisor-worker pattern is useful. A supervisor agent sees the full task and delegates subtasks to specialized worker agents.

def supervisor_node(state: SupervisorState) -> dict:
    """Decide which worker to call next."""
    # The supervisor decides routing based on the current state
    workers = ["researcher", "writer", "fact_checker"]

    prompt = f"""You are coordinating a team of workers.
Task: {state['task']}
Completed steps: {state['completed_steps']}

Which worker should act next? Choose from: {workers}
Or say DONE if the task is complete."""

    response = llm.invoke([HumanMessage(content=prompt)])
    next_worker = parse_worker_choice(response.content)

    return {"next_worker": next_worker}

The supervisor node’s output drives conditional routing to the appropriate worker node. Each worker reports back its results, updating shared state, and the supervisor decides what to do next.

What to Monitor in Production

When you deploy a LangGraph application:

Track iteration counts per workflow run. Sustained high iteration counts suggest routing logic failures or tasks that are poorly defined.
Log every state transition with timestamps. When something goes wrong, you need to reconstruct exactly what happened.
Monitor LLM call latency and cost per workflow run. Multi-agent systems can be expensive, and unexpected cost spikes usually indicate routing loops.
Add timeout limits at the graph level. A workflow that runs for more than 5 minutes is probably stuck.

LangGraph’s Studio (the visual debugger) is genuinely useful for developing and debugging graphs. It shows you the graph structure visually and lets you step through executions. Use it during development.

Conclusion

LangGraph provides the scaffolding that makes complex agent workflows manageable. The graph abstraction is the right mental model, the state management is explicit rather than hidden, and the checkpointing support makes production deployment practical.

Start with a small, well-defined workflow. Get that working and debugged before adding complexity. Add nodes and edges incrementally, testing at each step. The hardest part of multi-agent systems is not the LangGraph code: it is defining what each agent should do and how they should interact. Get the agent design right first, then implement it in LangGraph.

The full code from this tutorial is available as a starting point. Modify the execute_searches function to use a real search API (Tavily is a good choice), refine the prompts for your specific use case, and you have a working research agent.

Start Building Your Agent

The primitives are all here. For the LLM backbone, Claude’s API is the strongest choice for agentic workflows: it has the best tool-use reliability, a 200K context window for carrying long workflow state, and consistent instruction-following across multi-step tasks. Sign up for the API and the free tier gives you enough quota to build and test a complete LangGraph agent before committing to a paid plan. For an IDE that keeps up with agent development workflows, Replit handles dependency management and runs Python backends without local setup.

Disclosure: This article contains affiliate and referral links to Anthropic and Replit. We earn a commission when you sign up through these links at no cost to you.

What LangGraph Is#

When to Use LangGraph vs. Simple Chains#

Setting Up#

Building a Simple Research Agent#

Define the State#

Define the Nodes#

Build the Graph#

Run It#

Managing State Properly#

Handling Failures#

Retry Logic at the Node Level#

Graceful Degradation#

Iteration Limits#

Checkpointing and Human-in-the-Loop#

Supervisor-Worker Pattern#

What to Monitor in Production#

Conclusion#

Start Building Your Agent#

Get the AI tools that actually work

Related Articles

Best Automation Tools for Developers in 2026: n8n vs Zapier vs Make vs Python

Prompt Engineering Techniques That Actually Work in 2026