Skip to main content

Documentation Index

Fetch the complete documentation index at: https://aitutorial.dev/llms.txt

Use this file to discover all available pages before exploring further.

Without memory, agents make redundant tool calls and forget context between turns. This page covers working memory (MemorySaver) and long-term memory (cross-session persistence).

The Memory Problem in Production Agents

Consider this real-world scenario from a customer support agent:
User: "I ordered a laptop last week"
Agent: [searches orders] "Found order #12345 for MacBook Pro"

User: "When will it arrive?"
Agent: [searches orders AGAIN] "Order #12345 ships tomorrow"

User: "Can I change the address?"
Agent: [searches orders AGAIN] "For order #12345, yes I can help"
Problem: The agent makes three redundant tool calls, wasting tokens, time, and money — costing roughly 3x what it should. Solution: Memory that persists conversation state across turns so the agent doesn’t re-discover what it already knows.
Agent memory management is an advancing area with multiple approaches. We cover the foundational concepts here using LangGraph’s built-in memory. For deeper coverage, see the O’Reilly report “Managing Memory for AI Agents” in the Assets folder.

Memory Architecture: Two Tiers

Memory TypeDurationPurposeExample
Working MemorySingle sessionActive conversation state”User asked about order #12345”
Long-Term MemoryCross-sessionPersistent knowledge”User prefers email contact”
Think of working memory as your L1 cache (fast, temporary) and long-term memory as your database (persistent, searchable).

Working Memory with LangGraph

LangGraph provides built-in working memory via MemorySaver and thread_id. Each thread maintains its own conversation history automatically — no manual message tracking needed. The agent below looks up an order in turn 1. In turns 2 and 3, it answers follow-up questions using the cached tool results from the thread history — no redundant API calls: Notice: [API call] appears only once (turn 1). Turns 2 and 3 answer from the thread history. This is MemorySaver in action — it persists the full message chain including tool calls and results per thread_id. Key points:
  • checkpointer: new MemorySaver() enables automatic persistence
  • thread_id in configurable scopes memory to a conversation
  • The agent is stateless — all state lives in the checkpointer
  • One agent instance serves multiple users (different thread_id = different conversations)

Long-Term Memory: Cross-Session Knowledge

Working memory resets between sessions. But what about preferences, facts, and history that should persist across all conversations? Long-term memory requires a separate store — in production, a vector DB or managed service. Here we use a simple in-memory store to demonstrate the pattern: The agent gets save_preference and recall_preferences tools. In session 1, the user shares preferences. In session 2 (new thread), the agent recalls them from long-term memory: What’s happening:
  • Session 1: User says “I prefer email” → agent calls save_preference → stored in LongTermMemory
  • Session 2: New thread_id (working memory is empty) → agent calls recall_preferences → retrieves preferences from long-term store
The working memory (MemorySaver) forgets between sessions. The long-term memory persists.

Long-Term Memory in Production

For production, replace the in-memory store with a real backend:
ToolApproach
Redis Agent Memory ServerWorking + long-term memory with semantic search
Mem0Managed memory layer for agents
ZepLong-term memory with automatic extraction
LangChain MemoryBuilt-in LangChain/LangSmith integration
Claude Memory ToolAnthropic’s native memory
The integration pattern is the same regardless of backend:
  1. Store — save facts/preferences after conversations
  2. Search — retrieve relevant memories before generating a response (semantic search in production)
  3. Inject — add retrieved memories to the prompt or as tool results

Integration Patterns

Pattern 1: Code-Driven (Programmatic)

Your code decides when to store and retrieve. Predictable and efficient — you control exactly what gets remembered.
Pseudocode
// Before response: search for relevant context
const memories = await memoryStore.search(userId, userMessage);

// After response: store if it contains preferences
if (userMessage.includes("prefer")) {
    await memoryStore.save(userId, `User prefers: ${userMessage}`);
}

Pattern 2: LLM-Driven (Tool-Based)

Give the LLM memory tools — it decides what’s worth remembering. More natural but less predictable.
Pseudocode
const tools = [
    savePreference,      // LLM calls when user shares a preference
    recallPreferences,   // LLM calls at start of conversation
];
// The LLM autonomously decides when to store and retrieve
This is what our long-term memory demo uses — the LLM decides to call save_preference when the user says “I prefer email.”

Pattern 3: Background Extraction (Automatic)

Store every conversation, then a background process extracts important facts — preferences, events, decisions. Zero overhead during the conversation.
Pseudocode
// After each turn, store the full conversation
await memoryStore.saveConversation(sessionId, messages);

// Background process extracts:
// - Preferences ("prefers email notifications")
// - Facts ("subscription expires June 15")
// - Events ("reported billing issue on March 1")
Production recommendation: Start with code-driven for predictable behavior. Add background extraction for continuous learning. Use LLM-driven tools when conversational control matters.

Key Takeaways

  1. Working memory (within session) — use LangGraph’s MemorySaver + thread_id
  2. Long-term memory (across sessions) — requires external storage (Redis, vector DB, etc.)
  3. The agent is stateless — all state lives in the checkpointer, not the agent instance
  4. Thread isolation — different thread_id = different conversations, same agent
  5. Start simpleMemorySaver covers most use cases; add long-term memory when you need cross-session persistence