Curiosity · AI Model

Nomic Embed Text v2

Nomic Embed Text v2 is Nomic AI's second-generation open embedding model — published with open weights, open training data, and open training code for full reproducibility. It is multilingual, supports Matryoshka truncation, and is designed as a drop-in replacement for closed APIs when transparency and self-hosting matter.

Model specs

Vendor
Nomic AI
Family
Nomic Embed
Released
2025-02
Context window
8,192 tokens
Modalities
text

Strengths

  • Fully open — weights, data, and training code released
  • Matryoshka truncation for compact vectors
  • 8k-token input for longer chunks
  • Multilingual out of the box

Limitations

  • Trails closed APIs (text-embedding-3-large, voyage-3) on some English retrieval benchmarks
  • Self-hosted deployment requires GPU infrastructure
  • Smaller ecosystem of framework integrations than OpenAI

Use cases

  • Fully reproducible research embeddings
  • On-prem RAG for regulated industries
  • Multilingual enterprise search
  • Low-cost vector stores with Matryoshka-shrunk dims

Benchmarks

BenchmarkScoreAs of
MTEB (multilingual)≈602025-02
MIRACL≈552025-02

Frequently asked questions

What is Nomic Embed Text v2?

Nomic Embed Text v2 is an open-source multilingual text embedding model from Nomic AI, shipped with open weights, open training data, and open training code so that researchers and engineers can reproduce and audit the model end-to-end.

Why does reproducibility matter?

Many closed embedding APIs give you a vector but no insight into the training data, biases, or safety properties of the model. Nomic's fully open release lets regulators, auditors, and researchers verify the model's behaviour and retrain or fine-tune as needed.

How do I run Nomic Embed v2 locally?

The weights are on Hugging Face; any GPU-equipped host running transformers, sentence-transformers, or Nomic's own GPT4All-style runtime can serve embeddings. CPU inference is feasible for smaller corpora.

How does Nomic Embed v2 compare to Jina v3 and BGE-M3?

All three are open-weight multilingual embedders with roughly similar MTEB scores. Nomic leads on transparency (open data + code), Jina on task-LoRA flexibility, BGE-M3 on hybrid (dense+sparse+multi-vector) outputs.

Sources

  1. Nomic — Embed Text v2 — accessed 2026-04-20
  2. Hugging Face — nomic-ai/nomic-embed-text-v2-moe — accessed 2026-04-20