Curiosity · AI Model

E5-Large v2

E5-Large v2, released by Microsoft Research in 2022, is a ~335-million-parameter text embedding model trained with the E5 contrastive recipe (weakly supervised pretraining on text pairs, followed by supervised fine-tuning on retrieval datasets). It remains one of the most popular open-weights embedding models because of its strong quality per compute.

Model specs

Vendor
Microsoft
Family
E5
Released
2022-12
Context window
512 tokens
Modalities
text

Strengths

  • Open weights under MIT-style licensing
  • Strong retrieval per parameter — runs on a single CPU
  • Mature tooling in sentence-transformers, Haystack, LangChain

Limitations

  • 512-token input window limits long-document embedding
  • English-focused — use multilingual-e5-large for non-English
  • Surpassed by 7B LLM-based embedders on peak MTEB scores

Use cases

  • Affordable RAG retrieval pipelines on CPU or small GPUs
  • Semantic search at scale with modest cost
  • Classification and clustering over English corpora
  • Embedding baselines in research and courses

Benchmarks

BenchmarkScoreAs of
MTEB English average≈63 at release2023-03
BEIR averagecompetitive with much larger encoders2023-03

Frequently asked questions

What is E5-Large v2?

E5-Large v2 is Microsoft Research's 335-million-parameter open-weights English text embedding model, trained with the E5 contrastive recipe and widely used as a strong, cheap retrieval baseline.

How does E5 compare to all-mpnet-base-v2?

E5-Large v2 is larger and typically scores a couple of MTEB points higher, at the cost of slower inference. all-mpnet-base-v2 is the faster baseline; E5 is the higher-quality default.

Where can I download E5-Large v2?

Weights are on Hugging Face under 'intfloat/e5-large-v2', with sentence-transformers integration out of the box.

Sources

  1. arXiv — E5 embeddings paper — accessed 2026-04-20
  2. Hugging Face — intfloat/e5-large-v2 — accessed 2026-04-20