vlad.build

Why I Am a RAG Skeptic

I am a RAG skeptic. I think RAG is already being supplanted by agentic search. I would not invest in a RAG business today.

‘Context engineering’ is a fundamental challenge in AI. The premise is simple: the world contains enormous amounts of useful information, but only a tiny part of that information can fit in the model’s working memory (aka context window) at at any given time. How do we ensure that the model gets the right information at the right time?

RAG is a rough and “dumb” approach to this problem. Why?

First, because it destroys information. Documents are forcibly broken into chunks, the chunks are converted to a black-box embedding. At query time, chunks are retrieved based on vector similarity - a vague and black-boxy metric. And you only get a specific selection of chunks (like top-k) and each chunk is orphaned from its natural context.

Second, because the RAG system decides what’s relevant or not instead of the model. This decision is made based on simple heuristics (vector similarity, maybe keyword match). Inevitably this system will be dumber than the model, bottlenecking the model’s intelligence.

RAG was a bridging solution while the models were too dumb to manage their own context efficiently. But now the models have evolved. So what is the better alternative?

In their recent article Context Engineering for AI Agents, the team behind Manus explains how they use the file system as ultimate context. The file system is “unlimited in size, persistent by nature, and directly operable by the agent itself”. In short, the model manages its own context / memory by reading and writing files.

The team behind Claude Code did the same thing, discarding RAG solutions for a collection of simple tools that allow the model to read and search through files.

And in my PDF-Agent MCP, I enabled models to work with 1,500 page PDFs by autonomously deciding which parts of the PDF to load at any moment - often after running a regex search or looking up the table of contents.

In all these cases there is no loss of information; no hardcoded system that forcibly decides what’s the right context for the situation. Instead, the agent is given autonomy to decide how to manage its memory. This is a truly scalable approach that will improve with the model’s intelligence instead of bottlenecking it.

As an added benefit, you can get rid of all the RAG infrastructure: you don’t need to do chunking, embed the documents and the queries, or maintain an expensive vector database. It’s just an AI agent with a bunch of search tools.

You may also notice that this is how humans work. Given a task, we manage our own context by searching intelligently and deciding what information to focus on at any given time. It works.