Particle News: New Guidance Reframes Production RAG: Fix Retrieval, Not the LLM

Overview

At QCon London, Rabobank’s Lan Chu reported most failures trace to parsing, indexing, and retrieval rather than model behavior in an AI search used by 300+ staff across 10,000 documents.
Chu detailed layout‑aware parsing with vision‑language models, section‑based chunking, temporal scoring to favor fresher content, and a routing layer that can call external APIs when documents are insufficient.
A DEV guide outlines a layered production pipeline that adds query rewriting, hybrid dense+keyword search, reranking, and structured context assembly to cut hallucinations and irrelevant hits, with AWS services offered as implementation options.
OpenClaw’s RAG Architect consolidates best practices across document processing, embedding selection, vector database choices, advanced retrieval including RRF, and query transformation techniques such as HyDE.
A new tutorial shows a fully local stack using Ollama for embeddings and LLM plus PostgreSQL with pgvector, reports sub‑5‑second queries on modest hardware, and suggests upgrades like HNSW indexing, hybrid search, and cross‑encoder reranking.