Curiosity · AI Model

OpenAI text-embedding-3-large

text-embedding-3-large is OpenAI's flagship embedding model — 3072 default dimensions with Matryoshka truncation down to 256 or 1024 for cheaper storage. It leads on MTEB among closed embedding models and is the pragmatic default when a team is already building on OpenAI's API and wants one provider for both generation and retrieval.

Model specs

Vendor
OpenAI
Family
text-embedding-3
Released
2024-01
Context window
8,191 tokens
Modalities
text
Input price
$0.13/M tok
Output price
n/a
Pricing as of
2026-04-20

Strengths

  • Top-tier MTEB score among closed embedding models
  • Matryoshka representation — truncate to 256/1024 dims to shrink vector DB cost
  • Single-provider convenience if generation already runs on OpenAI
  • 8191-token input lets you embed long chunks without heavy preprocessing

Limitations

  • Closed weights — cannot self-host for privacy-sensitive workloads
  • Pricier per 1M tokens than text-embedding-3-small
  • Lags specialised rerankers for final-stage relevance — pair with a reranker

Use cases

  • RAG retrieval over enterprise knowledge bases
  • Semantic search across product catalogs or docs
  • Clustering and topic discovery on support tickets
  • Classification features for downstream ML pipelines

Benchmarks

BenchmarkScoreAs of
MTEB (English, avg)≈64.62024-01
MIRACL (multilingual)≈54.92024-01

Frequently asked questions

What is text-embedding-3-large?

text-embedding-3-large is OpenAI's flagship text embedding model, producing 3072-dimensional vectors that can be truncated to smaller sizes via Matryoshka representation learning. It is designed for retrieval, semantic search, clustering, and classification tasks.

What is the embedding dimension of text-embedding-3-large?

The default output is 3072 dimensions. Because the model is trained with Matryoshka representation learning, you can pass a smaller dimensions parameter (for example 1024 or 256) to save storage in your vector database with only a modest quality loss.

How much does text-embedding-3-large cost?

As of April 2026, text-embedding-3-large costs roughly USD 0.13 per million input tokens on the OpenAI API, with no separate output pricing since the model returns numeric vectors.

When should I use text-embedding-3-large vs text-embedding-3-small?

Use the large model when retrieval quality matters most and you can afford the extra cost and storage. Use the small model for high-volume ingestion, on-device caches, or when you need maximum throughput at minimum cost.

Sources

  1. OpenAI — New embedding models — accessed 2026-04-20
  2. OpenAI — Embeddings guide — accessed 2026-04-20