Particle.news
Download on the App Store

Generator-Focused RAG Benchmark Released as Hierarchical System Wins WattBot 2025

Results cap below 90% overall accuracy, signaling room to improve grounded reasoning, table use, abstention.

Overview

  • LIT-RAGBench debuts with five evaluation categories—Integration, Reasoning, Logic, Table, Abstention—to measure generator behavior under grounded retrieval.
  • The dataset includes 114 human-authored Japanese questions with curated English translations, uses fictional entities to avoid contamination, and applies LLM-as-a-judge scoring.
  • Across API-based and open-weight models, no system surpassed 90% overall accuracy, providing category-level breakdowns to guide model selection and development.
  • An accessible DEV guide outlines the standard RAG pipeline of chunking, embeddings, vector search, and context-conditioned generation, emphasizing benefits like reduced hallucination and access to fresh or private data.
  • Separately, KohakuRAG introduces hierarchical indexing with query planning, cross-query reranking, and ensemble voting with abstention-aware filtering, achieving first place on the WattBot 2025 Challenge with a 0.861 score and releasing code openly.