Curiosity · Concept

Cosine Similarity

Cosine similarity ignores vector magnitude and only cares about direction, which is exactly what you want when comparing embeddings of different-length texts. The score ranges from -1 (opposite) through 0 (orthogonal) to 1 (identical direction); embedding models are usually trained so semantically similar items land near 1. In practice, most vector databases store L2-normalized vectors so that cosine similarity collapses to a plain dot product — cheaper to compute and friendlier to ANN indexes.

Quick reference

Proficiency
Beginner
Also known as
cosine distance (= 1 - cosine similarity)
Prerequisites
vectors, dot product

Frequently asked questions

What is cosine similarity?

Cosine similarity is the cosine of the angle between two vectors — their dot product divided by the product of their norms. It ranges from -1 (opposite direction) through 0 (orthogonal) to 1 (identical direction) and is the standard similarity for embeddings.

Why is it preferred over Euclidean distance for embeddings?

Cosine ignores vector magnitude, so documents of different lengths with the same topic still match. Euclidean distance penalizes long vectors. Modern embedding models also explicitly optimize a cosine-based contrastive loss, so it's the metric they were trained for.

Is cosine similarity the same as dot product?

Only when both vectors are L2-normalized to unit length. Most vector databases normalize on ingest, after which cosine similarity equals the dot product and uses the faster inner-product index path.

When should I use something other than cosine?

Recommenders and some CLIP-style retrieval use dot product without normalization because magnitude encodes confidence. Classical IR uses BM25. Hamming distance fits binary embeddings. Pick the metric the embedding model was trained with.

Sources

  1. Wikipedia — Cosine similarity — accessed 2026-04-20
  2. Pinecone — Vector similarity explained — accessed 2026-04-20