Curiosity · Concept
Reranking
First-stage retrievers like BM25 or bi-encoder vector search are fast but coarse — they score query and document independently. A reranker is a cross-encoder that sees query and candidate together, typically joined with a [SEP] token, and outputs a relevance score. Because it attends jointly it captures subtle matches and negations far better than a bi-encoder, at the cost of quadratic compute. The standard stack retrieves ~100 candidates with a cheap retriever, then reranks the top 50-100 down to the 5-10 actually passed to the LLM. Cohere Rerank, BGE-Reranker, and Jina Reranker are popular hosted or open options.
Quick reference
- Proficiency
- Intermediate
- Also known as
- second-stage retrieval, cross-encoder reranking
- Prerequisites
- embeddings, retrieval-augmented generation
Frequently asked questions
What is reranking in RAG?
Reranking is a second stage where a cross-encoder model rescores the top-k candidates from a fast first-stage retriever (BM25, dense, or hybrid) and reorders them. It improves the precision of the passages handed to the LLM.
Why not use the reranker as the retriever directly?
Cross-encoders score one (query, document) pair at a time — applying one to a million-document corpus per query is prohibitively expensive. A cheap bi-encoder or BM25 shortlist makes the cross-encoder tractable on ~100 candidates.
How much does reranking actually help?
On BEIR and in most production A/B tests, adding a reranker lifts nDCG@10 by 5-15 points over dense-only retrieval and usually translates to a visible jump in end-to-end RAG answer quality.
Cohere Rerank vs open-source rerankers?
Cohere Rerank and Voyage AI are strong hosted options with low latency and multilingual support. BGE-Reranker-v2 and Jina Reranker v2 are competitive open-source choices you can self-host. Benchmark on your own queries — rerankers vary a lot by domain.
Sources
- Nogueira & Cho — Passage Re-ranking with BERT — accessed 2026-04-20
- Cohere — Say goodbye to irrelevant search results with Rerank — accessed 2026-04-20