Particle.news
Download on the App Store

New Guidance Reframes Production RAG: Fix Retrieval, Not the LLM

Practitioners spotlight indexing, chunking, routing, evaluation as the levers that raise reliability in real deployments.

Overview

  • At QCon London, Rabobank’s Lan Chu reported most failures trace to parsing, indexing, and retrieval rather than model behavior in an AI search used by 300+ staff across 10,000 documents.
  • Chu detailed layout‑aware parsing with vision‑language models, section‑based chunking, temporal scoring to favor fresher content, and a routing layer that can call external APIs when documents are insufficient.
  • A DEV guide outlines a layered production pipeline that adds query rewriting, hybrid dense+keyword search, reranking, and structured context assembly to cut hallucinations and irrelevant hits, with AWS services offered as implementation options.
  • OpenClaw’s RAG Architect consolidates best practices across document processing, embedding selection, vector database choices, advanced retrieval including RRF, and query transformation techniques such as HyDE.
  • A new tutorial shows a fully local stack using Ollama for embeddings and LLM plus PostgreSQL with pgvector, reports sub‑5‑second queries on modest hardware, and suggests upgrades like HNSW indexing, hybrid search, and cross‑encoder reranking.