RAG is overused. There are cleaner solutions for certain memory patterns — here’s when and how to use them.
The RAG Default
Ask any AI engineer how to give an agent long-term memory and the answer is almost always RAG: embed the information, store it in a vector database, retrieve it at query time.
RAG is a powerful pattern. But it’s been cargo-culted to the point where it’s the default answer regardless of whether it fits the problem.
When RAG Is the Wrong Answer
RAG works well when:
- Information is document-like (paragraphs, articles, unstructured text)
- Queries are semantic (you’re looking for similar meaning, not exact matches)
- The information set is large enough to justify the embedding overhead
RAG works poorly when:
- You need exact lookups (user preferences, entity properties, structured data)
- Your information has complex relationships (graphs, hierarchies)
- Update frequency is high (RAG retrieval reflects stale embeddings during updates)
- Precision matters more than recall
The Alternatives Worth Knowing
Key-value stores for user state: If your agent needs to remember “this user prefers bullet points over prose,” that’s not a semantic search problem. It’s a lookup problem. A simple key-value store with structured keys is faster, more reliable, and easier to debug.
Structured databases for relational data: Agent memory often has structure that embedding destroys. A project has tasks, each task has subtasks, each subtask has a status. Model this as a relational structure and query it with SQL, not vector similarity.
Knowledge graphs for interconnected concepts: When the relationships between pieces of information matter as much as the information itself, a graph database (or even a simple in-memory graph) is more appropriate than a vector store.
Episodic buffers for recent context: For short-to-medium term memory (last session, last few interactions), a simple ring buffer of structured events is often more effective than retrieval.
Making the Choice
The right memory architecture depends on the query pattern:
| Query type | Solution |
|---|---|
| ”What did the user say about X?” | Vector search (RAG) |
| “What are the user’s preferences?” | Key-value store |
| ”What tasks are in this project?” | Relational DB |
| ”What concepts connect X and Y?” | Knowledge graph |
| ”What happened in the last session?” | Episodic buffer |
Most real agents need 2-3 of these working together, not just one. The mistake is trying to use RAG for all of them.
A Practical Architecture
Here’s the memory architecture I’ve landed on for most agent projects:
- Working memory — the current context window, managed carefully
- Episodic memory — a structured log of recent events, last 7-30 days
- Semantic memory — RAG over long-form content (articles, docs, notes)
- Procedural memory — a database of user preferences and entity properties
Each layer has different retrieval mechanisms and different update patterns. The total system is more complex than RAG alone, but each component is simpler and more reliable.