Curiosity · AI Model

OpenAI text-embedding-3-large

text-embedding-3-large is OpenAI's flagship embedding model — 3072 default dimensions with Matryoshka truncation down to 256 or 1024 for cheaper storage. It leads on MTEB among closed embedding models and is the pragmatic default when a team is already building on OpenAI's API and wants one provider for both generation and retrieval.

Model specs

Vendor: OpenAI
Family: text-embedding-3
Released: 2024-01
Context window: 8,191 tokens
Modalities: text
Input price: $0.13/M tok
Output price: n/a
Pricing as of: 2026-04-20

Strengths

Top-tier MTEB score among closed embedding models
Matryoshka representation — truncate to 256/1024 dims to shrink vector DB cost
Single-provider convenience if generation already runs on OpenAI
8191-token input lets you embed long chunks without heavy preprocessing

Limitations

Closed weights — cannot self-host for privacy-sensitive workloads
Pricier per 1M tokens than text-embedding-3-small
Lags specialised rerankers for final-stage relevance — pair with a reranker

Use cases

RAG retrieval over enterprise knowledge bases
Semantic search across product catalogs or docs
Clustering and topic discovery on support tickets
Classification features for downstream ML pipelines

Benchmarks

Benchmark	Score	As of
MTEB (English, avg)	≈64.6	2024-01
MIRACL (multilingual)	≈54.9	2024-01

Frequently asked questions

What is text-embedding-3-large?

text-embedding-3-large is OpenAI's flagship text embedding model, producing 3072-dimensional vectors that can be truncated to smaller sizes via Matryoshka representation learning. It is designed for retrieval, semantic search, clustering, and classification tasks.

What is the embedding dimension of text-embedding-3-large?

The default output is 3072 dimensions. Because the model is trained with Matryoshka representation learning, you can pass a smaller dimensions parameter (for example 1024 or 256) to save storage in your vector database with only a modest quality loss.

How much does text-embedding-3-large cost?

As of April 2026, text-embedding-3-large costs roughly USD 0.13 per million input tokens on the OpenAI API, with no separate output pricing since the model returns numeric vectors.

When should I use text-embedding-3-large vs text-embedding-3-small?

Use the large model when retrieval quality matters most and you can afford the extra cost and storage. Use the small model for high-volume ingestion, on-device caches, or when you need maximum throughput at minimum cost.

Sources

OpenAI — New embedding models — accessed 2026-04-20
OpenAI — Embeddings guide — accessed 2026-04-20