Introduction
AI agents represent the next evolution of LLM applications. Rather than simply generating text in response to a prompt, an agent can reason, plan, and take actions by invoking external tools — search the web, query a database, call an API, or execute code — all in an autonomous loop until a task is completed.
LangChain has become the de facto framework for building these agents in Python. With the release of LangChain v0.2+ and the maturation of its tool-calling abstractions, building production-grade AI agents is more accessible than ever. This guide walks you through the core patterns, real code examples, and deployment considerations for building agents with LangChain in 2025–2026.
Whether you are creating a customer support bot that can look up orders, a research assistant that can browse and summarize documents, or an autonomous coding agent, the patterns in this guide will serve as your foundation.
What Is an AI Agent?
An AI agent is an LLM-powered system that follows a reasoning-action loop. Unlike a simple chain (prompt → LLM → output), an agent:
- Observes the current context (user query, conversation history, previous tool results)
- Reasons about what to do next
- Acts by selecting and calling a tool with specific inputs
- Observes the tool result and decides whether to act again or return a final answer
This loop — often called the ReAct pattern (Reason + Act) — is the core of most agent architectures. The LLM serves as the reasoning engine, while tools provide the capabilities to interact with the outside world.
- An agent is not just an LLM. It is an LLM + tools + a loop. The quality of your agent depends on all three: the model’s reasoning ability, the design of your tools, and the orchestration logic that ties them together.
LangChain Agent Architecture Overview
LangChain provides several approaches to building agents. The ecosystem has evolved significantly, so understanding which API to use is critical:
| Approach | Status | Best For | Key Feature |
|---|---|---|---|
AgentExecutor |
Legacy (still works) | Simple, single-agent use cases | Quick setup, minimal boilerplate |
create_react_agent |
Current (LangChain) | Standard ReAct agents | Works with tool-calling LLMs |
create_tool_calling_agent |
Current (LangChain) | Models with native tool calling | Uses model’s built-in function calling |
| LangGraph | Recommended for production | Complex, stateful, multi-agent systems | Full control over agent loop, persistence, human-in-the-loop |
For new projects, LangChain Inc. recommends LangGraph for production agents because it gives you explicit control over the agent loop, supports streaming, persistence, and multi-agent patterns. However, understanding the LangChain-level abstractions (create_react_agent, create_tool_calling_agent) is essential, as they form the building blocks used inside LangGraph nodes.
Setting Up Your Environment
Before building agents, install the required packages:
# Install core packages
pip install langchain langchain-openai langchain-community langgraph
# For additional tool integrations
pip install langchain-experimental duckduckgo-search wikipedia
Set your API keys as environment variables:
import os
os.environ["OPENAI_API_KEY"] = "sk-..."
# Or use dotenv
from dotenv import load_dotenv
load_dotenv()
Building Your First Agent with Tool Calling
Modern LLMs like GPT-4, Claude, and Gemini support native tool calling (also called function calling). Instead of relying on text-based prompts to coerce the LLM into outputting tool invocations in a specific format, the model has been fine-tuned to output structured tool calls natively. LangChain’s create_tool_calling_agent leverages this capability.
Step 1: Define Your Tools
Tools are the capabilities your agent can use. LangChain provides the @tool decorator to turn any Python function into a tool:
from langchain_core.tools import tool
from typing import Optional
@tool
def search_knowledge_base(query: str) -> str:
"""Search the internal knowledge base for information about products and policies.
Args:
query: The search query to look up in the knowledge base.
"""
knowledge = {
"return policy": "Items can be returned within 30 days with receipt.",
"shipping": "Free shipping on orders over $50. Standard delivery 3-5 days.",
"warranty": "All electronics come with a 1-year manufacturer warranty.",
}
for key, value in knowledge.items():
if key in query.lower():
return value
return "No relevant information found for that query."
@tool
def get_order_status(order_id: str) -> str:
"""Look up the current status of a customer order.
Args:
order_id: The unique order identifier (e.g., ORD-12345).
"""
orders = {
"ORD-12345": "Shipped - Expected delivery March 12, 2026",
"ORD-67890": "Processing - Will ship within 24 hours",
}
return orders.get(order_id, f"Order {order_id} not found.")
@tool
def calculate_discount(price: float, discount_percent: float) -> str:
"""Calculate the discounted price for a product.
Args:
price: The original price of the product.
discount_percent: The discount percentage to apply (e.g., 20 for 20%).
"""
discounted = price * (1 - discount_percent / 100)
return f"Original: ${price:.2f} | Discount: {discount_percent}% | Final: ${discounted:.2f}"
tools = [search_knowledge_base, get_order_status, calculate_discount]
- Docstrings matter enormously. The LLM reads the docstring to decide when and how to call the tool. Be descriptive and include argument explanations.
- Return strings. Tools should return human-readable strings that the LLM can interpret and relay to the user.
- Handle errors gracefully. Return informative error messages rather than raising exceptions, so the agent can recover.
- Keep tools focused. One tool = one capability. Prefer many small tools over one giant multi-purpose tool.
Step 2: Create the Agent
Now wire the tools to an LLM using create_tool_calling_agent and AgentExecutor:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.agents import create_tool_calling_agent, AgentExecutor
# Initialize the LLM
llm = ChatOpenAI(model="gpt-4o", temperature=0)
# Define the prompt template
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful customer support agent for an e-commerce store. "
"Use the available tools to look up information and help customers. "
"Always be polite and concise."),
MessagesPlaceholder(variable_name="chat_history", optional=True),
("human", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad"),
])
# Create the agent
agent = create_tool_calling_agent(llm, tools, prompt)
# Wrap in AgentExecutor to run the loop
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
verbose=True,
max_iterations=10,
handle_parsing_errors=True,
)
# Run the agent
response = agent_executor.invoke({
"input": "What is the status of order ORD-12345 and what is your return policy?"
})
print(response["output"])
When you run this, the agent will reason that it needs to make two tool calls — one to get_order_status and one to search_knowledge_base — then synthesize both results into a coherent response for the user.
Adding Conversation Memory
Agents become much more useful when they can remember previous messages in a conversation. LangChain provides several memory patterns:
Simple Message History
from langchain_core.messages import HumanMessage, AIMessage
chat_history = []
def chat(user_input: str) -> str:
response = agent_executor.invoke({
"input": user_input,
"chat_history": chat_history,
})
chat_history.append(HumanMessage(content=user_input))
chat_history.append(AIMessage(content=response["output"]))
return response["output"]
print(chat("What is order ORD-12345's status?"))
print(chat("And when will it arrive?"))
Persistent Memory with RunnableWithMessageHistory
For production applications, you want memory that persists across sessions:
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory
session_store = {}
def get_session_history(session_id: str):
if session_id not in session_store:
session_store[session_id] = ChatMessageHistory()
return session_store[session_id]
agent_with_history = RunnableWithMessageHistory(
agent_executor,
get_session_history,
input_messages_key="input",
history_messages_key="chat_history",
)
response = agent_with_history.invoke(
{"input": "Check order ORD-67890"},
config={"configurable": {"session_id": "user-abc-123"}},
)
Advanced Tool Patterns
Structured Tool Input with Pydantic
For tools that require complex inputs, use Pydantic models for validation:
from langchain_core.tools import StructuredTool
from pydantic import BaseModel, Field
class SearchInput(BaseModel):
query: str = Field(description="The search query")
max_results: int = Field(default=5, description="Maximum number of results")
category: Optional[str] = Field(default=None, description="Filter by category")
def search_products(query: str, max_results: int = 5, category: Optional[str] = None) -> str:
"""Search the product catalog."""
return f"Found {max_results} results for '{query}' in {category or 'all categories'}"
search_tool = StructuredTool.from_function(
func=search_products,
name="search_products",
description="Search the product catalog with optional filters",
args_schema=SearchInput,
)
Retrieval Tools (RAG Agent)
One of the most powerful agent patterns combines tool calling with Retrieval-Augmented Generation (RAG). The agent decides when to retrieve information rather than always retrieving:
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.tools.retriever import create_retriever_tool
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_texts(
["LangChain is a framework for LLM apps...",
"Agents use tools to interact with the world...",
"RAG combines retrieval with generation..."],
embeddings
)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
retriever_tool = create_retriever_tool(
retriever,
name="search_documentation",
description="Search the technical documentation. Use this when the user asks about "
"technical concepts, APIs, or implementation details.",
)
tools = [retriever_tool, get_order_status, calculate_discount]
Building Production Agents with LangGraph
While AgentExecutor works for prototyping, LangGraph is the recommended approach for production agents. It gives you explicit control over the execution loop, supports streaming, persistence, and human-in-the-loop patterns.
The Prebuilt ReAct Agent
from langgraph.prebuilt import create_react_agent
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o", temperature=0)
agent = create_react_agent(
model=llm,
tools=tools,
prompt="You are a helpful customer support agent. Be concise and friendly.",
)
result = agent.invoke({
"messages": [{"role": "user", "content": "What is my order status for ORD-12345?"}]
})
for chunk in agent.stream(
{"messages": [{"role": "user", "content": "Check order ORD-67890"}]},
stream_mode="values",
):
if chunk["messages"]:
chunk["messages"][-1].pretty_print()
Custom Agent Loop with LangGraph
For full control, build the agent loop manually as a graph:
from langgraph.graph import StateGraph, START, END
from langgraph.prebuilt import ToolNode
from langchain_core.messages import BaseMessage
from typing import TypedDict, Annotated
from operator import add
class AgentState(TypedDict):
messages: Annotated[list[BaseMessage], add]
def call_model(state: AgentState):
messages = state["messages"]
response = llm.bind_tools(tools).invoke(messages)
return {"messages": [response]}
def should_continue(state: AgentState):
last_message = state["messages"][-1]
if last_message.tool_calls:
return "tools"
return END
workflow = StateGraph(AgentState)
workflow.add_node("agent", call_model)
workflow.add_node("tools", ToolNode(tools))
workflow.add_edge(START, "agent")
workflow.add_conditional_edges("agent", should_continue, {"tools": "tools", END: END})
workflow.add_edge("tools", "agent")
from langgraph.checkpoint.memory import MemorySaver
memory = MemorySaver()
agent = workflow.compile(checkpointer=memory)
config = {"configurable": {"thread_id": "session-001"}}
result = agent.invoke(
{"messages": [{"role": "user", "content": "Hello! Check order ORD-12345"}]},
config=config,
)
- Streaming: Stream tokens, tool calls, and intermediate steps in real time
- Persistence: Checkpoint state to PostgreSQL, SQLite, or Redis so conversations survive restarts
- Human-in-the-loop: Pause execution before sensitive tool calls and wait for human approval
- Debugging: Full visibility into each node execution via LangSmith integration
- Subgraphs: Compose complex agents from smaller, testable sub-agents
Multi-Agent Architectures
As agent complexity grows, a single monolithic agent becomes hard to manage. Multi-agent architectures split responsibilities across specialized agents. LangGraph supports two primary patterns:
Supervisor Pattern
A supervisor agent orchestrates multiple worker agents, deciding which one to delegate to:
from langgraph.prebuilt import create_react_agent
research_agent = create_react_agent(
model=llm,
tools=[search_documentation, web_search],
prompt="You are a research specialist. Find accurate information.",
)
writer_agent = create_react_agent(
model=llm,
tools=[write_document, format_text],
prompt="You are a technical writer. Create clear, well-structured content.",
)
Handoff Pattern
In the handoff pattern, agents transfer control directly to each other for linear pipelines:
workflow = StateGraph(AgentState)
workflow.add_node("researcher", research_agent)
workflow.add_node("writer", writer_agent)
workflow.add_edge(START, "researcher")
workflow.add_edge("researcher", "writer")
workflow.add_edge("writer", END)
Production Deployment Patterns
Error Handling and Fallbacks
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
max_iterations=15,
max_execution_time=60,
handle_parsing_errors=True,
early_stopping_method="generate",
)
Serving Agents as APIs
from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI()
class AgentRequest(BaseModel):
message: str
session_id: str
@app.post("/chat")
async def chat(request: AgentRequest):
config = {"configurable": {"thread_id": request.session_id}}
result = agent.invoke(
{"messages": [{"role": "user", "content": request.message}]},
config=config,
)
return {"response": result["messages"][-1].content}
Cost Control
- Use smaller models for simple routing — GPT-4o-mini or Claude Haiku for tool selection, GPT-4o or Claude Sonnet for complex reasoning
- Cache tool results — If the same tool call with the same inputs is made frequently, cache the result
- Set strict iteration limits — Prevent runaway agent loops with
max_iterationsandmax_execution_time - Monitor token usage — Use LangSmith to track token consumption per conversation and set alerts
- Limit chat history length — Use a sliding window or summarization to keep the message history from growing unbounded
- Enable LangSmith tracing for all environments
- Set
max_iterationsandmax_execution_timeon every agent - Implement graceful error handling in all tools
- Use persistent checkpointers (PostgreSQL, Redis) instead of in-memory
- Add rate limiting and authentication to your API endpoints
- Test agent behavior with evaluation datasets, not just manual testing
- Implement guardrails to prevent prompt injection and harmful tool use
Conclusion
Building AI agents with LangChain has become remarkably accessible. The framework provides a clear progression: start with create_tool_calling_agent and AgentExecutor for prototyping, then graduate to LangGraph for production-grade agents with persistence, streaming, and full loop control.
The key to successful agents lies not just in the framework, but in tool design (clear descriptions, robust error handling), memory management (persistent, bounded history), and observability (LangSmith tracing, cost monitoring). Multi-agent architectures with supervisor or handoff patterns allow you to scale complexity without sacrificing maintainability.
As the ecosystem continues to evolve rapidly, LangGraph is emerging as the standard for production agent systems, while LangChain’s core abstractions remain the lingua franca for LLM tool integration. Start simple, test thoroughly, and iterate — the best agent architectures are built incrementally, not designed in a single sprint.