What is Self-RAG?
TL;DR
RAG variant where the LLM decides whether to retrieve, evaluates retrieved snippets, and verifies its own answer is grounded. Cuts hallucinations ~50%; standard in 2026.
Self-RAG: Definition & Explanation
Self-RAG (Self-Reflective Retrieval-Augmented Generation), proposed by University of Washington and AI2 in 2023, moved into production in 2024-2026. Unlike vanilla RAG (retrieve → generate), Self-RAG emits four reflection tokens: (1) Retrieve (do we need to retrieve at all?), (2) IsRel (is this snippet relevant?), (3) IsSup (is the generated text supported by the snippets?), (4) IsUse (utility 1-5). Flow: (a) decide whether to retrieve, (b) if so, vector search + relevance filter, (c) parallel-generate over candidates, (d) self-score factuality and utility, (e) return the best. Wins: (I) -50% hallucinations (per the paper), (II) skip unnecessary retrieval, saving latency and cost, (III) automatic citations, (IV) dynamic retrieval depth. Implementations: LangChain Self-RAG modules, LlamaIndex Self-Reflective Agents, Anthropic Claude Skills (Reflection by default), OpenAI o3/o4-mini Deep Research (effectively Self-RAG). 2026 trends: Corrective RAG (CRAG), GraphRAG, and Agentic RAG (multi-tool selection); multi-modal Self-RAG (image / video); Adaptive RAG (query-complexity-aware strategy switching). Practical: required in medicine, law, and finance; pairs well with Anthropic Constitutional AI; default vector DBs are Pinecone, Weaviate, and Chroma.