Curiosity · AI Model

all-mpnet-base-v2

all-mpnet-base-v2, published by the sentence-transformers / UKP Lab community in 2021, is arguably the most widely deployed open-weights text-embedding model in the world. At 110M parameters it fits comfortably on CPU, scores solidly on MTEB, and has been the quiet backbone of countless RAG pipelines, documentation search tools, and semantic-deduplication jobs.

Model specs

Vendor
sentence-transformers
Family
MPNet
Released
2021-05
Context window
384 tokens
Modalities
text

Strengths

  • Tiny, CPU-deployable, and fast
  • Battle-tested across years of production use
  • Apache 2.0-compatible open weights via sentence-transformers

Limitations

  • Outclassed on peak MTEB scores by E5 and LLM-based embedders
  • 384-token window limits long-document handling
  • English-centric — use multilingual models for non-English

Use cases

  • CPU-friendly RAG encoders for docs search
  • Semantic deduplication and near-duplicate detection
  • Clustering short texts (support tickets, reviews)
  • Classroom examples of sentence embeddings

Benchmarks

BenchmarkScoreAs of
MTEB English average≈582023-03
STS-B≈852021-06

Frequently asked questions

What is all-mpnet-base-v2?

all-mpnet-base-v2 is a 110-million-parameter open-weights English sentence embedding model in the sentence-transformers library, fine-tuned from Microsoft's MPNet with a multi-source contrastive dataset.

Is all-mpnet-base-v2 still the best default?

For CPU-bound pipelines it is still a great default. If you have GPUs and want higher retrieval quality, E5-Large v2, NV-Embed v2, or SFR-Embedding score higher on MTEB.

How do I use all-mpnet-base-v2?

Install sentence-transformers and call 'SentenceTransformer("all-mpnet-base-v2")'. The model is also available directly from Hugging Face under 'sentence-transformers/all-mpnet-base-v2'.

Sources

  1. Hugging Face — sentence-transformers/all-mpnet-base-v2 — accessed 2026-04-20
  2. arXiv — Sentence-BERT paper — accessed 2026-04-20