Curiosity · Concept

Beam Search

Beam search maintains B candidate sequences (the 'beams') and at each step extends each one by the top tokens, keeping only the B overall best-scoring continuations. Compared to greedy decoding it catches situations where a locally low-probability token leads to a globally better sequence — which is why it remained the default for machine translation and summarization for years. For open-ended generation, however, beam search famously produces repetitive, generic text ('neural text degeneration'), so most chat-style LLMs use sampling (top-p / temperature) instead.

Quick reference

Proficiency
Intermediate
Also known as
beam decoding
Prerequisites
Autoregressive generation, Temperature sampling

Frequently asked questions

What is beam search?

Beam search is a heuristic decoding algorithm that keeps the B most-probable partial sequences at each step. It explores multiple continuations in parallel instead of committing to the single best next token, producing a better approximation of the globally most-probable sequence.

When is beam search useful?

For tasks with a narrow set of correct answers — machine translation, grammatical error correction, abstractive summarization — where the goal is to find a high-likelihood sequence rather than a diverse one.

Why don't chat models use beam search?

Open-ended generation with beam search produces repetitive, generic text. The argmax-probability sequence tends to be bland and often falls into repetition loops. Sampling methods (top-p, temperature) give more human-like output.

What are diverse beam search and length penalty?

Diverse beam search (Vijayakumar et al.) groups beams and penalises similarity across groups, giving more varied outputs. Length penalty rescales sequence probability by length to stop beam search preferring short outputs.

Sources

  1. Koehn — Statistical Machine Translation (beam search in MT) — accessed 2026-04-20
  2. Vijayakumar et al. — Diverse Beam Search — accessed 2026-04-20