The Memory Problem in Production Agents
Consider this real-world scenario from a customer support agent:Agent memory management is an advancing area with multiple approaches. We cover the foundational concepts here using LangGraph’s built-in memory. For deeper coverage, see the O’Reilly report “Managing Memory for AI Agents” in the Assets folder.
Memory Architecture: Two Tiers
| Memory Type | Duration | Purpose | Example |
|---|---|---|---|
| Working Memory | Single session | Active conversation state | ”User asked about order #12345” |
| Long-Term Memory | Cross-session | Persistent knowledge | ”User prefers email contact” |
Working Memory with LangGraph
LangGraph provides built-in working memory via MemorySaver and thread_id. Each thread maintains its own conversation history automatically — no manual message tracking needed. The agent below looks up an order in turn 1. In turns 2 and 3, it answers follow-up questions using the cached tool results from the thread history — no redundant API calls: Notice:[API call] appears only once (turn 1). Turns 2 and 3 answer from the thread history. This is MemorySaver in action — it persists the full message chain including tool calls and results per thread_id.
Key points:
checkpointer: new MemorySaver()enables automatic persistencethread_idinconfigurablescopes memory to a conversation- The agent is stateless — all state lives in the checkpointer
- One agent instance serves multiple users (different
thread_id= different conversations)
Long-Term Memory: Cross-Session Knowledge
Working memory resets between sessions. But what about preferences, facts, and history that should persist across all conversations? Long-term memory requires a separate store — in production, a vector DB or managed service. Here we use a simple in-memory store to demonstrate the pattern: The agent getssave_preference and recall_preferences tools. In session 1, the user shares preferences. In session 2 (new thread), the agent recalls them from long-term memory:
What’s happening:
- Session 1: User says “I prefer email” → agent calls
save_preference→ stored inLongTermMemory - Session 2: New
thread_id(working memory is empty) → agent callsrecall_preferences→ retrieves preferences from long-term store
MemorySaver) forgets between sessions. The long-term memory persists.
Long-Term Memory in Production
For production, replace the in-memory store with a real backend:| Tool | Approach |
|---|---|
| Redis Agent Memory Server | Working + long-term memory with semantic search |
| Mem0 | Managed memory layer for agents |
| Zep | Long-term memory with automatic extraction |
| LangChain Memory | Built-in LangChain/LangSmith integration |
| Claude Memory Tool | Anthropic’s native memory |
- Store — save facts/preferences after conversations
- Search — retrieve relevant memories before generating a response (semantic search in production)
- Inject — add retrieved memories to the prompt or as tool results
Integration Patterns
Pattern 1: Code-Driven (Programmatic)
Your code decides when to store and retrieve. Predictable and efficient — you control exactly what gets remembered.Pseudocode
Pattern 2: LLM-Driven (Tool-Based)
Give the LLM memory tools — it decides what’s worth remembering. More natural but less predictable.Pseudocode
save_preference when the user says “I prefer email.”
Pattern 3: Background Extraction (Automatic)
Store every conversation, then a background process extracts important facts — preferences, events, decisions. Zero overhead during the conversation.Pseudocode
Key Takeaways
- Working memory (within session) — use LangGraph’s
MemorySaver+thread_id - Long-term memory (across sessions) — requires external storage (Redis, vector DB, etc.)
- The agent is stateless — all state lives in the checkpointer, not the agent instance
- Thread isolation — different
thread_id= different conversations, same agent - Start simple —
MemorySavercovers most use cases; add long-term memory when you need cross-session persistence