Curiosity · Concept

Self-Consistency Decoding

A single greedy chain-of-thought can take a wrong turn early and commit to a bad answer. Self-consistency, proposed by Wang et al. (2022), instead samples K reasoning traces with temperature > 0 and aggregates their final answers by majority vote — the intuition being that correct reasoning paths converge on the same answer while wrong ones scatter. It reliably boosts accuracy on math, commonsense, and symbolic reasoning benchmarks at the cost of K× tokens, and it composes with most other prompting tricks.

Quick reference

Proficiency
Intermediate
Also known as
self-consistency decoding, CoT-SC
Prerequisites
chain-of-thought, sampling temperature

Frequently asked questions

What is self-consistency?

Self-consistency is a decoding strategy that samples multiple chain-of-thought reasoning paths at non-zero temperature and takes a majority vote over their final answers, instead of trusting a single greedy decode.

Why does voting beat a single best path?

Reasoning is high-variance. Many correct paths land on the same answer, while wrong paths fail in many different ways — so the correct answer usually wins a plurality even when no individual path is reliable.

How many samples do I need?

Gains saturate around K=20-40 on most benchmarks. K=5-10 already captures most of the lift and keeps cost reasonable. Higher K helps more on harder problems with more reasoning branch points.

How does it relate to tree-of-thoughts?

Self-consistency samples independent linear traces. Tree-of-thoughts actively branches and prunes a tree with explicit evaluation at each node, so it can solve harder problems but is much more expensive. Self-consistency is the simple, strong baseline.

Sources

  1. Wang et al. — Self-Consistency Improves Chain of Thought Reasoning — accessed 2026-04-20
  2. Google Research blog — Self-consistency — accessed 2026-04-20