What is Self-Consistency?
TL;DR
An inference-time technique that samples Chain-of-Thought reasoning N times and majority-votes the answer. Boosts math and logic accuracy 10-30% — the theoretical foundation for o1/o3-style reasoning models.
Self-Consistency: Definition & Explanation
Self-Consistency, introduced by Google Brain in 2022, is an inference-time technique that runs Chain-of-Thought (CoT) prompting N times at temperature 0.5-1.0 and majority-votes the final answer (number, label, or selection). Compared to greedy decoding (T=0), it raises accuracy by 5-30% on math (GSM8K), logic (AQUA-RAT), and symbolic reasoning, and became the theoretical foundation for the 'internal multi-step thinking' models (o1, o3, Gemini 2.5 Thinking, Claude Extended Thinking) in 2024-2026. Implementation is simple: (1) call the API N times in parallel with the same prompt, (2) extract the final answer from each, (3) take the mode. N=5-10 is typically the cost/quality sweet spot. Applied to (a) Kaggle reasoning tasks, (b) enterprise QA reliability, (c) agent action selection, (d) code generation (multiple candidates → unit-test filter). Variants: Universal Self-Consistency (let an LLM judge the votes), Self-Refine (self-critique loop), Tree-of-Thoughts (branched reasoning). Note: cost scales with N; reasoning models already do this internally so layering it on top is wasteful.