Curiosity · Concept

HyDE (Hypothetical Document Embeddings)

Queries and relevant documents often live in very different parts of embedding space: a short question and a long answer rarely embed close together. HyDE, proposed by Gao et al. in 2022, side-steps the mismatch by asking an LLM to write a fake answer first and using the embedding of that fake answer as the search vector. The hypothetical document is usually wrong in details but shares vocabulary, structure, and topical signal with real answers, pulling far better neighbors out of the index. It's a strong zero-shot retrieval baseline when you lack labeled query-document pairs to fine-tune an encoder.

Quick reference

Proficiency
Intermediate
Also known as
Hypothetical Document Embeddings
Prerequisites
embeddings, retrieval-augmented generation

Frequently asked questions

What is HyDE?

HyDE (Hypothetical Document Embeddings) is a retrieval pattern where the LLM first generates a hypothetical answer to the query, then that generated answer — not the original query — is embedded and searched against the vector index.

Why does embedding a hallucinated answer help?

Questions and answers live in different regions of embedding space. A hypothetical answer shares the vocabulary, length, and structure of real answers, so its nearest neighbors are real answers rather than other similarly-phrased questions.

Isn't it expensive to call the LLM twice?

Yes, so HyDE is best when retrieval quality matters more than latency, or when you can cache hypothetical documents. Many teams use it only for ambiguous or low-confidence queries after a cheaper first pass.

How does HyDE compare to query rewriting?

Query rewriting keeps the result short and question-shaped; HyDE produces an answer-shaped expansion. On dense retrievers HyDE typically wins because the index is full of answer-shaped documents; on hybrid BM25+dense both can help.

Sources

  1. Gao et al. — Precise Zero-Shot Dense Retrieval without Relevance Labels (HyDE) — accessed 2026-04-20
  2. LangChain — HyDE retriever — accessed 2026-04-20