What RAG Systems Are and Why They Matter Now

Search takes up a lot of time at work. Some studies suggest knowledge workers spend up to 2.5 hours per day searching for information. Not because they're inefficient, but because company knowledge lives in thousands of documents scattered across SharePoint, old project folders, and half-remembered file names. That's the search tax, and for decades it was a big cost of doing business.

Why search keeps failing you

Traditional search matches words. Your team searches "client onboarding process" but the document is titled "New Customer Intake Workflow." They search "remote work policy" and miss "flexible working arrangement" and "hybrid working guidelines." Every synonym is a locked door.

This is the vocabulary mismatch problem. One consultant calls it "digital transformation," another writes "modernisation programme," a third says "technology enablement." All mean roughly the same thing, but keyword search treats them as completely different. The larger your knowledge base, the more ways people describe the same concepts, and the harder it gets to find anything.

What changed: search by meaning, not words

Between 2020 and 2024, three things converged. First, the technology matured—transformer models and embeddings moved from research papers to production systems that actually work. Second, the cost collapsed from requiring machine learning engineers and months of work to a few hundred pounds a month in API calls. Third, it became accessible as Microsoft, Google, and others built this into tools SMEs already use.

The result is Retrieval-Augmented Generation, or RAG. The name is terrible, but the idea is straightforward: search by meaning instead of words, then use AI to synthesize an answer from what you find.

How it actually works

RAG systems do two things. First, they retrieve relevant information by converting your documents into numerical representations of their meaning—embeddings. When someone asks a question, their question gets converted the same way, and the system uses mathematics to find which document chunks have the most similar meaning.

The best implementations combine meaning-based search with traditional keyword search. If your consultant searches "ISO 27001," the system prioritizes chunks with that exact standard. Semantic search finds conceptual matches while keyword search catches precise terminology, and together they're significantly better than either alone.

Second, the system generates an answer by retrieving five to seven relevant chunks and handing them to a large language model with clear instructions: "Read these excerpts and answer the question. Use only what's provided. Cite your sources." What comes back is a natural-language answer with citations: "Our approach to retail transformation focuses on customer journey redesign, modernising point-of-sale systems, and building data capabilities. [Source: Retail Strategy 2024, p.12]."

Those citations aren't decoration—your consultant can click through to check the context and decide if the answer fits her specific situation.

The obvious question: don't we already have this?

If you use Microsoft 365, you might be thinking: "Isn't this what Copilot does?" Yes and no. Copilot searches all your Microsoft environment—email, Teams, SharePoint, OneDrive, everything. For many firms, that's perfect: broad coverage, zero setup, £25-30 per user per month.

But if you're a consultancy with 20 years of deliverables, Copilot searching everything might be too broad. It can't distinguish between an old email thread and your definitive methodology document. Custom RAG lets you focus on specific SharePoint libraries, tune how documents get chunked, and write prompts that understand your terminology.

The trade-off: Copilot is plug-and-play with broad coverage, while custom RAG is configured for focused, high-quality results on specific use cases.

Who benefits most

This matters most where knowledge is the product. Professional services firms—consulting, law, architecture—where past work is the template for future work. Technical support teams answering similar questions repeatedly. Distributed organizations where knowledge is siloed across offices and teams. Compliance-heavy industries where finding the right policy version matters.

The common thread is businesses where people spend significant time searching, where the right answer exists in your documents, and where finding it faster has clear value.

Why now matters

For the first time, this technology is affordable and accessible for normal SMEs, not just Fortune 500 companies with data science teams. The tools are APIs and cloud services, and the cost is hundreds of pounds monthly rather than hundreds of thousands.

If your team wastes significant time searching for information that exists somewhere, this is worth understanding. The gap between keyword search and semantic search isn't incremental—it's the difference between matching words and matching meaning. And for knowledge work, meaning is what matters.

Next
Next

The Back Office, Rewired