Overview
- SURE-RAG introduces evidence sufficiency checks that score whether retrieved passages truly support an answer, reaching 0.9075 Macro-F1 on a controlled multi-hop benchmark and cutting unsafe answers by 37% at 30% coverage.
- Researchers show that retrieving “thinking traces” — prior step-by-step solution paths — boosts reasoning on math and code tasks, with a simple retrieve-then-generate setup delivering large gains on AIME and other tests using compact, structured trace representations.
- A controlled biomedical study finds Cross-Encoder reranking delivers the highest contextual precision and the top composite score, while a dense retrieval baseline ranks a close second, and all retrieval strategies far outperform a no-context setup on answer relevancy.
- FT-RAG tackles complex tables by breaking them into cell-level units, linking related entries in a graph, and fusing structure with text, reporting state-of-the-art results and large hit-rate gains on the new Multi-Table-RAG-Lib benchmark.
- New security work warns that attackers can infer which documents a RAG system has ingested through exam-style queries, and a forensic method called RAGCharacter pinpoints poisoned evidence down to the character span to explain misgeneration events.