Curiosity · AI Model
E5-Large v2
E5-Large v2, released by Microsoft Research in 2022, is a ~335-million-parameter text embedding model trained with the E5 contrastive recipe (weakly supervised pretraining on text pairs, followed by supervised fine-tuning on retrieval datasets). It remains one of the most popular open-weights embedding models because of its strong quality per compute.
Model specs
- Vendor
- Microsoft
- Family
- E5
- Released
- 2022-12
- Context window
- 512 tokens
- Modalities
- text
Strengths
- Open weights under MIT-style licensing
- Strong retrieval per parameter — runs on a single CPU
- Mature tooling in sentence-transformers, Haystack, LangChain
Limitations
- 512-token input window limits long-document embedding
- English-focused — use multilingual-e5-large for non-English
- Surpassed by 7B LLM-based embedders on peak MTEB scores
Use cases
- Affordable RAG retrieval pipelines on CPU or small GPUs
- Semantic search at scale with modest cost
- Classification and clustering over English corpora
- Embedding baselines in research and courses
Benchmarks
| Benchmark | Score | As of |
|---|---|---|
| MTEB English average | ≈63 at release | 2023-03 |
| BEIR average | competitive with much larger encoders | 2023-03 |
Frequently asked questions
What is E5-Large v2?
E5-Large v2 is Microsoft Research's 335-million-parameter open-weights English text embedding model, trained with the E5 contrastive recipe and widely used as a strong, cheap retrieval baseline.
How does E5 compare to all-mpnet-base-v2?
E5-Large v2 is larger and typically scores a couple of MTEB points higher, at the cost of slower inference. all-mpnet-base-v2 is the faster baseline; E5 is the higher-quality default.
Where can I download E5-Large v2?
Weights are on Hugging Face under 'intfloat/e5-large-v2', with sentence-transformers integration out of the box.
Sources
- arXiv — E5 embeddings paper — accessed 2026-04-20
- Hugging Face — intfloat/e5-large-v2 — accessed 2026-04-20