Curiosity · AI Model

all-mpnet-base-v2

all-mpnet-base-v2, published by the sentence-transformers / UKP Lab community in 2021, is arguably the most widely deployed open-weights text-embedding model in the world. At 110M parameters it fits comfortably on CPU, scores solidly on MTEB, and has been the quiet backbone of countless RAG pipelines, documentation search tools, and semantic-deduplication jobs.

Model specs

Vendor: sentence-transformers
Family: MPNet
Released: 2021-05
Context window: 384 tokens
Modalities: text

Strengths

Tiny, CPU-deployable, and fast
Battle-tested across years of production use
Apache 2.0-compatible open weights via sentence-transformers

Limitations

Outclassed on peak MTEB scores by E5 and LLM-based embedders
384-token window limits long-document handling
English-centric — use multilingual models for non-English

Use cases

CPU-friendly RAG encoders for docs search
Semantic deduplication and near-duplicate detection
Clustering short texts (support tickets, reviews)
Classroom examples of sentence embeddings

Benchmarks

Benchmark	Score	As of
MTEB English average	≈58	2023-03
STS-B	≈85	2021-06

Frequently asked questions

What is all-mpnet-base-v2?

all-mpnet-base-v2 is a 110-million-parameter open-weights English sentence embedding model in the sentence-transformers library, fine-tuned from Microsoft's MPNet with a multi-source contrastive dataset.

Is all-mpnet-base-v2 still the best default?

For CPU-bound pipelines it is still a great default. If you have GPUs and want higher retrieval quality, E5-Large v2, NV-Embed v2, or SFR-Embedding score higher on MTEB.

How do I use all-mpnet-base-v2?

Install sentence-transformers and call 'SentenceTransformer("all-mpnet-base-v2")'. The model is also available directly from Hugging Face under 'sentence-transformers/all-mpnet-base-v2'.

Sources

Hugging Face — sentence-transformers/all-mpnet-base-v2 — accessed 2026-04-20
arXiv — Sentence-BERT paper — accessed 2026-04-20