Overview
- At QCon London, Rabobank’s Lan Chu reported most failures trace to parsing, indexing, and retrieval rather than model behavior in an AI search used by 300+ staff across 10,000 documents.
- Chu detailed layout‑aware parsing with vision‑language models, section‑based chunking, temporal scoring to favor fresher content, and a routing layer that can call external APIs when documents are insufficient.
- A DEV guide outlines a layered production pipeline that adds query rewriting, hybrid dense+keyword search, reranking, and structured context assembly to cut hallucinations and irrelevant hits, with AWS services offered as implementation options.
- OpenClaw’s RAG Architect consolidates best practices across document processing, embedding selection, vector database choices, advanced retrieval including RRF, and query transformation techniques such as HyDE.
- A new tutorial shows a fully local stack using Ollama for embeddings and LLM plus PostgreSQL with pgvector, reports sub‑5‑second queries on modest hardware, and suggests upgrades like HNSW indexing, hybrid search, and cross‑encoder reranking.