Agent Memory - AI Tutorial

Without memory, agents make redundant tool calls and forget context between turns. This page covers working memory (MemorySaver) and long-term memory (cross-session persistence).

The Memory Problem in Production Agents

Consider this real-world scenario from a customer support agent:

User: "I ordered a laptop last week"
Agent: [searches orders] "Found order #12345 for MacBook Pro"

User: "When will it arrive?"
Agent: [searches orders AGAIN] "Order #12345 ships tomorrow"

User: "Can I change the address?"
Agent: [searches orders AGAIN] "For order #12345, yes I can help"

Problem: The agent makes three redundant tool calls, wasting tokens, time, and money — costing roughly 3x what it should. Solution: Memory that persists conversation state across turns so the agent doesn’t re-discover what it already knows.

Agent memory management is an advancing area with multiple approaches. We cover the foundational concepts here using LangGraph’s built-in memory. For deeper coverage, see the O’Reilly report “Managing Memory for AI Agents” in the Assets folder.

Memory Architecture: Two Tiers

Memory Type	Duration	Purpose	Example
Working Memory	Single session	Active conversation state	”User asked about order #12345”
Long-Term Memory	Cross-session	Persistent knowledge	”User prefers email contact”

Think of working memory as your L1 cache (fast, temporary) and long-term memory as your database (persistent, searchable).

Working Memory with LangGraph

LangGraph provides built-in working memory via MemorySaver and thread_id. Each thread maintains its own conversation history automatically — no manual message tracking needed. The agent below looks up an order in turn 1. In turns 2 and 3, it answers follow-up questions using the cached tool results from the thread history — no redundant API calls: Notice: [API call] appears only once (turn 1). Turns 2 and 3 answer from the thread history. This is MemorySaver in action — it persists the full message chain including tool calls and results per thread_id. Key points:

checkpointer: new MemorySaver() enables automatic persistence
thread_id in configurable scopes memory to a conversation
The agent is stateless — all state lives in the checkpointer
One agent instance serves multiple users (different thread_id = different conversations)

Long-Term Memory: Cross-Session Knowledge

Working memory resets between sessions. But what about preferences, facts, and history that should persist across all conversations? Long-term memory requires a separate store — in production, a vector DB or managed service. Here we use a simple in-memory store to demonstrate the pattern: The agent gets save_preference and recall_preferences tools. In session 1, the user shares preferences. In session 2 (new thread), the agent recalls them from long-term memory: What’s happening:

Session 1: User says “I prefer email” → agent calls save_preference → stored in LongTermMemory
Session 2: New thread_id (working memory is empty) → agent calls recall_preferences → retrieves preferences from long-term store

The working memory (MemorySaver) forgets between sessions. The long-term memory persists.

Long-Term Memory in Production

For production, replace the in-memory store with a real backend:

Tool	Approach
Redis Agent Memory Server	Working + long-term memory with semantic search
Mem0	Managed memory layer for agents
Zep	Long-term memory with automatic extraction
LangChain Memory	Built-in LangChain/LangSmith integration
Claude Memory Tool	Anthropic’s native memory

The integration pattern is the same regardless of backend:

Store — save facts/preferences after conversations
Search — retrieve relevant memories before generating a response (semantic search in production)
Inject — add retrieved memories to the prompt or as tool results

Integration Patterns

Pattern 1: Code-Driven (Programmatic)

Your code decides when to store and retrieve. Predictable and efficient — you control exactly what gets remembered.

Pseudocode

// Before response: search for relevant context
const memories = await memoryStore.search(userId, userMessage);

// After response: store if it contains preferences
if (userMessage.includes("prefer")) {
    await memoryStore.save(userId, `User prefers: ${userMessage}`);
}

Pattern 2: LLM-Driven (Tool-Based)

Give the LLM memory tools — it decides what’s worth remembering. More natural but less predictable.

Pseudocode

const tools = [
    savePreference,      // LLM calls when user shares a preference
    recallPreferences,   // LLM calls at start of conversation
];
// The LLM autonomously decides when to store and retrieve

This is what our long-term memory demo uses — the LLM decides to call save_preference when the user says “I prefer email.”

Pattern 3: Background Extraction (Automatic)

Store every conversation, then a background process extracts important facts — preferences, events, decisions. Zero overhead during the conversation.

Pseudocode

// After each turn, store the full conversation
await memoryStore.saveConversation(sessionId, messages);

// Background process extracts:
// - Preferences ("prefers email notifications")
// - Facts ("subscription expires June 15")
// - Events ("reported billing issue on March 1")

Production recommendation: Start with code-driven for predictable behavior. Add background extraction for continuous learning. Use LLM-driven tools when conversational control matters.

Key Takeaways

Working memory (within session) — use LangGraph’s MemorySaver + thread_id
Long-term memory (across sessions) — requires external storage (Redis, vector DB, etc.)
The agent is stateless — all state lives in the checkpointer, not the agent instance
Thread isolation — different thread_id = different conversations, same agent
Start simple — MemorySaver covers most use cases; add long-term memory when you need cross-session persistence

​The Memory Problem in Production Agents

​Memory Architecture: Two Tiers

​Working Memory with LangGraph

​Long-Term Memory: Cross-Session Knowledge

​Long-Term Memory in Production

​Integration Patterns

​Pattern 1: Code-Driven (Programmatic)

​Pattern 2: LLM-Driven (Tool-Based)

​Pattern 3: Background Extraction (Automatic)

​Key Takeaways

The Memory Problem in Production Agents

Memory Architecture: Two Tiers

Working Memory with LangGraph

Long-Term Memory: Cross-Session Knowledge

Long-Term Memory in Production

Integration Patterns

Pattern 1: Code-Driven (Programmatic)

Pattern 2: LLM-Driven (Tool-Based)

Pattern 3: Background Extraction (Automatic)

Key Takeaways