Building AI Agents with LangChain: A Practical Guide

Introduction

AI agents represent the next evolution of LLM applications. Rather than simply generating text in response to a prompt, an agent can reason, plan, and take actions by invoking external tools — search the web, query a database, call an API, or execute code — all in an autonomous loop until a task is completed.

LangChain has become the de facto framework for building these agents in Python. With the release of LangChain v0.2+ and the maturation of its tool-calling abstractions, building production-grade AI agents is more accessible than ever. This guide walks you through the core patterns, real code examples, and deployment considerations for building agents with LangChain in 2025–2026.

Whether you are creating a customer support bot that can look up orders, a research assistant that can browse and summarize documents, or an autonomous coding agent, the patterns in this guide will serve as your foundation.

What Is an AI Agent?

An AI agent is an LLM-powered system that follows a reasoning-action loop. Unlike a simple chain (prompt → LLM → output), an agent:

Observes the current context (user query, conversation history, previous tool results)
Reasons about what to do next
Acts by selecting and calling a tool with specific inputs
Observes the tool result and decides whether to act again or return a final answer

This loop — often called the ReAct pattern (Reason + Act) — is the core of most agent architectures. The LLM serves as the reasoning engine, while tools provide the capabilities to interact with the outside world.

Key Insight:

An agent is not just an LLM. It is an LLM + tools + a loop. The quality of your agent depends on all three: the model’s reasoning ability, the design of your tools, and the orchestration logic that ties them together.

LangChain Agent Architecture Overview

LangChain provides several approaches to building agents. The ecosystem has evolved significantly, so understanding which API to use is critical:

Approach	Status	Best For	Key Feature
`AgentExecutor`	Legacy (still works)	Simple, single-agent use cases	Quick setup, minimal boilerplate
`create_react_agent`	Current (LangChain)	Standard ReAct agents	Works with tool-calling LLMs
`create_tool_calling_agent`	Current (LangChain)	Models with native tool calling	Uses model’s built-in function calling
LangGraph	Recommended for production	Complex, stateful, multi-agent systems	Full control over agent loop, persistence, human-in-the-loop

For new projects, LangChain Inc. recommends LangGraph for production agents because it gives you explicit control over the agent loop, supports streaming, persistence, and multi-agent patterns. However, understanding the LangChain-level abstractions (create_react_agent, create_tool_calling_agent) is essential, as they form the building blocks used inside LangGraph nodes.

Setting Up Your Environment

Before building agents, install the required packages:

# Install core packages
pip install langchain langchain-openai langchain-community langgraph

# For additional tool integrations
pip install langchain-experimental duckduckgo-search wikipedia

Set your API keys as environment variables:

import os
os.environ["OPENAI_API_KEY"] = "sk-..."

# Or use dotenv
from dotenv import load_dotenv
load_dotenv()

Building Your First Agent with Tool Calling

Modern LLMs like GPT-4, Claude, and Gemini support native tool calling (also called function calling). Instead of relying on text-based prompts to coerce the LLM into outputting tool invocations in a specific format, the model has been fine-tuned to output structured tool calls natively. LangChain’s create_tool_calling_agent leverages this capability.

Step 1: Define Your Tools

Tools are the capabilities your agent can use. LangChain provides the @tool decorator to turn any Python function into a tool:

from langchain_core.tools import tool
from typing import Optional

@tool
def search_knowledge_base(query: str) -> str:
    """Search the internal knowledge base for information about products and policies.

    Args:
        query: The search query to look up in the knowledge base.
    """
    knowledge = {
        "return policy": "Items can be returned within 30 days with receipt.",
        "shipping": "Free shipping on orders over $50. Standard delivery 3-5 days.",
        "warranty": "All electronics come with a 1-year manufacturer warranty.",
    }
    for key, value in knowledge.items():
        if key in query.lower():
            return value
    return "No relevant information found for that query."

@tool
def get_order_status(order_id: str) -> str:
    """Look up the current status of a customer order.

    Args:
        order_id: The unique order identifier (e.g., ORD-12345).
    """
    orders = {
        "ORD-12345": "Shipped - Expected delivery March 12, 2026",
        "ORD-67890": "Processing - Will ship within 24 hours",
    }
    return orders.get(order_id, f"Order {order_id} not found.")

@tool
def calculate_discount(price: float, discount_percent: float) -> str:
    """Calculate the discounted price for a product.

    Args:
        price: The original price of the product.
        discount_percent: The discount percentage to apply (e.g., 20 for 20%).
    """
    discounted = price * (1 - discount_percent / 100)
    return f"Original: ${price:.2f} | Discount: {discount_percent}% | Final: ${discounted:.2f}"

tools = [search_knowledge_base, get_order_status, calculate_discount]

Tool Design Best Practices:

Docstrings matter enormously. The LLM reads the docstring to decide when and how to call the tool. Be descriptive and include argument explanations.
Return strings. Tools should return human-readable strings that the LLM can interpret and relay to the user.
Handle errors gracefully. Return informative error messages rather than raising exceptions, so the agent can recover.
Keep tools focused. One tool = one capability. Prefer many small tools over one giant multi-purpose tool.

Step 2: Create the Agent

Now wire the tools to an LLM using create_tool_calling_agent and AgentExecutor:

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.agents import create_tool_calling_agent, AgentExecutor

# Initialize the LLM
llm = ChatOpenAI(model="gpt-4o", temperature=0)

# Define the prompt template
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful customer support agent for an e-commerce store. "
               "Use the available tools to look up information and help customers. "
               "Always be polite and concise."),
    MessagesPlaceholder(variable_name="chat_history", optional=True),
    ("human", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad"),
])

# Create the agent
agent = create_tool_calling_agent(llm, tools, prompt)

# Wrap in AgentExecutor to run the loop
agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,
    max_iterations=10,
    handle_parsing_errors=True,
)

# Run the agent
response = agent_executor.invoke({
    "input": "What is the status of order ORD-12345 and what is your return policy?"
})
print(response["output"])

When you run this, the agent will reason that it needs to make two tool calls — one to get_order_status and one to search_knowledge_base — then synthesize both results into a coherent response for the user.

Adding Conversation Memory

Agents become much more useful when they can remember previous messages in a conversation. LangChain provides several memory patterns:

Simple Message History

from langchain_core.messages import HumanMessage, AIMessage

chat_history = []

def chat(user_input: str) -> str:
    response = agent_executor.invoke({
        "input": user_input,
        "chat_history": chat_history,
    })

    chat_history.append(HumanMessage(content=user_input))
    chat_history.append(AIMessage(content=response["output"]))

    return response["output"]

print(chat("What is order ORD-12345's status?"))
print(chat("And when will it arrive?"))

Persistent Memory with RunnableWithMessageHistory

For production applications, you want memory that persists across sessions:

from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory

session_store = {}

def get_session_history(session_id: str):
    if session_id not in session_store:
        session_store[session_id] = ChatMessageHistory()
    return session_store[session_id]

agent_with_history = RunnableWithMessageHistory(
    agent_executor,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
)

response = agent_with_history.invoke(
    {"input": "Check order ORD-67890"},
    config={"configurable": {"session_id": "user-abc-123"}},
)

Advanced Tool Patterns

Structured Tool Input with Pydantic

For tools that require complex inputs, use Pydantic models for validation:

from langchain_core.tools import StructuredTool
from pydantic import BaseModel, Field

class SearchInput(BaseModel):
    query: str = Field(description="The search query")
    max_results: int = Field(default=5, description="Maximum number of results")
    category: Optional[str] = Field(default=None, description="Filter by category")

def search_products(query: str, max_results: int = 5, category: Optional[str] = None) -> str:
    """Search the product catalog."""
    return f"Found {max_results} results for '{query}' in {category or 'all categories'}"

search_tool = StructuredTool.from_function(
    func=search_products,
    name="search_products",
    description="Search the product catalog with optional filters",
    args_schema=SearchInput,
)

Retrieval Tools (RAG Agent)

One of the most powerful agent patterns combines tool calling with Retrieval-Augmented Generation (RAG). The agent decides when to retrieve information rather than always retrieving:

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.tools.retriever import create_retriever_tool

embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_texts(
    ["LangChain is a framework for LLM apps...",
     "Agents use tools to interact with the world...",
     "RAG combines retrieval with generation..."],
    embeddings
)

retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
retriever_tool = create_retriever_tool(
    retriever,
    name="search_documentation",
    description="Search the technical documentation. Use this when the user asks about "
                "technical concepts, APIs, or implementation details.",
)

tools = [retriever_tool, get_order_status, calculate_discount]

Building Production Agents with LangGraph

While AgentExecutor works for prototyping, LangGraph is the recommended approach for production agents. It gives you explicit control over the execution loop, supports streaming, persistence, and human-in-the-loop patterns.

The Prebuilt ReAct Agent

from langgraph.prebuilt import create_react_agent
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o", temperature=0)

agent = create_react_agent(
    model=llm,
    tools=tools,
    prompt="You are a helpful customer support agent. Be concise and friendly.",
)

result = agent.invoke({
    "messages": [{"role": "user", "content": "What is my order status for ORD-12345?"}]
})

for chunk in agent.stream(
    {"messages": [{"role": "user", "content": "Check order ORD-67890"}]},
    stream_mode="values",
):
    if chunk["messages"]:
        chunk["messages"][-1].pretty_print()

Custom Agent Loop with LangGraph

For full control, build the agent loop manually as a graph:

from langgraph.graph import StateGraph, START, END
from langgraph.prebuilt import ToolNode
from langchain_core.messages import BaseMessage
from typing import TypedDict, Annotated
from operator import add

class AgentState(TypedDict):
    messages: Annotated[list[BaseMessage], add]

def call_model(state: AgentState):
    messages = state["messages"]
    response = llm.bind_tools(tools).invoke(messages)
    return {"messages": [response]}

def should_continue(state: AgentState):
    last_message = state["messages"][-1]
    if last_message.tool_calls:
        return "tools"
    return END

workflow = StateGraph(AgentState)
workflow.add_node("agent", call_model)
workflow.add_node("tools", ToolNode(tools))

workflow.add_edge(START, "agent")
workflow.add_conditional_edges("agent", should_continue, {"tools": "tools", END: END})
workflow.add_edge("tools", "agent")

from langgraph.checkpoint.memory import MemorySaver
memory = MemorySaver()
agent = workflow.compile(checkpointer=memory)

config = {"configurable": {"thread_id": "session-001"}}
result = agent.invoke(
    {"messages": [{"role": "user", "content": "Hello! Check order ORD-12345"}]},
    config=config,
)

Why LangGraph for Production:

Streaming: Stream tokens, tool calls, and intermediate steps in real time
Persistence: Checkpoint state to PostgreSQL, SQLite, or Redis so conversations survive restarts
Human-in-the-loop: Pause execution before sensitive tool calls and wait for human approval
Debugging: Full visibility into each node execution via LangSmith integration
Subgraphs: Compose complex agents from smaller, testable sub-agents

Multi-Agent Architectures

As agent complexity grows, a single monolithic agent becomes hard to manage. Multi-agent architectures split responsibilities across specialized agents. LangGraph supports two primary patterns:

Supervisor Pattern

A supervisor agent orchestrates multiple worker agents, deciding which one to delegate to:

from langgraph.prebuilt import create_react_agent

research_agent = create_react_agent(
    model=llm,
    tools=[search_documentation, web_search],
    prompt="You are a research specialist. Find accurate information.",
)

writer_agent = create_react_agent(
    model=llm,
    tools=[write_document, format_text],
    prompt="You are a technical writer. Create clear, well-structured content.",
)

Handoff Pattern

In the handoff pattern, agents transfer control directly to each other for linear pipelines:

workflow = StateGraph(AgentState)
workflow.add_node("researcher", research_agent)
workflow.add_node("writer", writer_agent)
workflow.add_edge(START, "researcher")
workflow.add_edge("researcher", "writer")
workflow.add_edge("writer", END)

Production Deployment Patterns

Error Handling and Fallbacks

agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    max_iterations=15,
    max_execution_time=60,
    handle_parsing_errors=True,
    early_stopping_method="generate",
)

Serving Agents as APIs

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()

class AgentRequest(BaseModel):
    message: str
    session_id: str

@app.post("/chat")
async def chat(request: AgentRequest):
    config = {"configurable": {"thread_id": request.session_id}}
    result = agent.invoke(
        {"messages": [{"role": "user", "content": request.message}]},
        config=config,
    )
    return {"response": result["messages"][-1].content}

Cost Control

Use smaller models for simple routing — GPT-4o-mini or Claude Haiku for tool selection, GPT-4o or Claude Sonnet for complex reasoning
Cache tool results — If the same tool call with the same inputs is made frequently, cache the result
Set strict iteration limits — Prevent runaway agent loops with max_iterations and max_execution_time
Monitor token usage — Use LangSmith to track token consumption per conversation and set alerts
Limit chat history length — Use a sliding window or summarization to keep the message history from growing unbounded

Production Checklist:

Enable LangSmith tracing for all environments
Set max_iterations and max_execution_time on every agent
Implement graceful error handling in all tools
Use persistent checkpointers (PostgreSQL, Redis) instead of in-memory
Add rate limiting and authentication to your API endpoints
Test agent behavior with evaluation datasets, not just manual testing
Implement guardrails to prevent prompt injection and harmful tool use

Conclusion

Building AI agents with LangChain has become remarkably accessible. The framework provides a clear progression: start with create_tool_calling_agent and AgentExecutor for prototyping, then graduate to LangGraph for production-grade agents with persistence, streaming, and full loop control.

The key to successful agents lies not just in the framework, but in tool design (clear descriptions, robust error handling), memory management (persistent, bounded history), and observability (LangSmith tracing, cost monitoring). Multi-agent architectures with supervisor or handoff patterns allow you to scale complexity without sacrificing maintainability.

As the ecosystem continues to evolve rapidly, LangGraph is emerging as the standard for production agent systems, while LangChain’s core abstractions remain the lingua franca for LLM tool integration. Start simple, test thoroughly, and iterate — the best agent architectures are built incrementally, not designed in a single sprint.