Skip to main content
Without memory, agents make redundant tool calls and forget context between turns. This page covers working memory (MemorySaver) and long-term memory (cross-session persistence).

The Memory Problem in Production Agents

Consider this real-world scenario from a customer support agent:
User: "I ordered a laptop last week"
Agent: [searches orders] "Found order #12345 for MacBook Pro"

User: "When will it arrive?"
Agent: [searches orders AGAIN] "Order #12345 ships tomorrow"

User: "Can I change the address?"
Agent: [searches orders AGAIN] "For order #12345, yes I can help"
Problem: The agent makes three redundant tool calls, wasting tokens, time, and money — costing roughly 3x what it should. Solution: Memory that persists conversation state across turns so the agent doesn’t re-discover what it already knows.
Agent memory management is an advancing area with multiple approaches. We cover the foundational concepts here using LangGraph’s built-in memory. For deeper coverage, see the O’Reilly report “Managing Memory for AI Agents” in the Assets folder.

Memory Architecture: Two Tiers

Memory TypeDurationPurposeExample
Working MemorySingle sessionActive conversation state”User asked about order #12345”
Long-Term MemoryCross-sessionPersistent knowledge”User prefers email contact”
Think of working memory as your L1 cache (fast, temporary) and long-term memory as your database (persistent, searchable).

Working Memory with LangGraph

LangGraph provides built-in working memory via MemorySaver and thread_id. Each thread maintains its own conversation history automatically — no manual message tracking needed. The agent below looks up an order in turn 1. In turns 2 and 3, it answers follow-up questions using the cached tool results from the thread history — no redundant API calls: Notice: [API call] appears only once (turn 1). Turns 2 and 3 answer from the thread history. This is MemorySaver in action — it persists the full message chain including tool calls and results per thread_id. Key points:
  • checkpointer: new MemorySaver() enables automatic persistence
  • thread_id in configurable scopes memory to a conversation
  • The agent is stateless — all state lives in the checkpointer
  • One agent instance serves multiple users (different thread_id = different conversations)

Long-Term Memory: Cross-Session Knowledge

Working memory resets between sessions. But what about preferences, facts, and history that should persist across all conversations? Long-term memory requires a separate store — in production, a vector DB or managed service. Here we use a simple in-memory store to demonstrate the pattern: The agent gets save_preference and recall_preferences tools. In session 1, the user shares preferences. In session 2 (new thread), the agent recalls them from long-term memory: What’s happening:
  • Session 1: User says “I prefer email” → agent calls save_preference → stored in LongTermMemory
  • Session 2: New thread_id (working memory is empty) → agent calls recall_preferences → retrieves preferences from long-term store
The working memory (MemorySaver) forgets between sessions. The long-term memory persists.

Long-Term Memory in Production

For production, replace the in-memory store with a real backend:
ToolApproach
Redis Agent Memory ServerWorking + long-term memory with semantic search
Mem0Managed memory layer for agents
ZepLong-term memory with automatic extraction
LangChain MemoryBuilt-in LangChain/LangSmith integration
Claude Memory ToolAnthropic’s native memory
The integration pattern is the same regardless of backend:
  1. Store — save facts/preferences after conversations
  2. Search — retrieve relevant memories before generating a response (semantic search in production)
  3. Inject — add retrieved memories to the prompt or as tool results

Integration Patterns

Pattern 1: Code-Driven (Programmatic)

Your code decides when to store and retrieve. Predictable and efficient — you control exactly what gets remembered.
Pseudocode
// Before response: search for relevant context
const memories = await memoryStore.search(userId, userMessage);

// After response: store if it contains preferences
if (userMessage.includes("prefer")) {
    await memoryStore.save(userId, `User prefers: ${userMessage}`);
}

Pattern 2: LLM-Driven (Tool-Based)

Give the LLM memory tools — it decides what’s worth remembering. More natural but less predictable.
Pseudocode
const tools = [
    savePreference,      // LLM calls when user shares a preference
    recallPreferences,   // LLM calls at start of conversation
];
// The LLM autonomously decides when to store and retrieve
This is what our long-term memory demo uses — the LLM decides to call save_preference when the user says “I prefer email.”

Pattern 3: Background Extraction (Automatic)

Store every conversation, then a background process extracts important facts — preferences, events, decisions. Zero overhead during the conversation.
Pseudocode
// After each turn, store the full conversation
await memoryStore.saveConversation(sessionId, messages);

// Background process extracts:
// - Preferences ("prefers email notifications")
// - Facts ("subscription expires June 15")
// - Events ("reported billing issue on March 1")
Production recommendation: Start with code-driven for predictable behavior. Add background extraction for continuous learning. Use LLM-driven tools when conversational control matters.

Key Takeaways

  1. Working memory (within session) — use LangGraph’s MemorySaver + thread_id
  2. Long-term memory (across sessions) — requires external storage (Redis, vector DB, etc.)
  3. The agent is stateless — all state lives in the checkpointer, not the agent instance
  4. Thread isolation — different thread_id = different conversations, same agent
  5. Start simpleMemorySaver covers most use cases; add long-term memory when you need cross-session persistence