Capability · Framework — rag

Vespa

Vespa unifies full-text, vector, and tensor-based ranking inside one distributed engine. It's the retrieval layer behind Yahoo Mail, Spotify, and many large recommender systems. Its expressive ranking DSL and native tensor support make it uniquely suited for late-interaction retrievers like ColBERT.

Framework facts

Category: rag
Language: C++ / Java / Python client
License: Apache-2.0
Repository: https://github.com/vespa-engine/vespa

Install

pip install pyvespa
# or run locally
docker run --detach --name vespa -p 8080:8080 vespaengine/vespa

Quickstart

from vespa.package import ApplicationPackage, Field, Schema, Document

app = ApplicationPackage(name='docs', schema=[Schema(
    name='doc',
    document=Document(fields=[
        Field(name='text', type='string', indexing=['index', 'summary']),
        Field(name='emb', type='tensor<float>(x[768])', indexing=['attribute']),
    ]),
)])

Alternatives

Elasticsearch — similar breadth, weaker tensor ranking
OpenSearch — AWS fork of Elasticsearch
Milvus — pure vector DB
Qdrant — simpler vector-first engine

Frequently asked questions

Why pick Vespa over a pure vector DB?

Because retrieval is rarely pure ANN. Vespa gives you structured filters, keyword BM25, learned ranking, and tensor ops in a single query — ideal for production recommender and search workloads.

Is Vespa hard to operate?

It has a steeper learning curve than Qdrant or pgvector. Vespa Cloud is the recommended path for most teams who don't want to run a cluster themselves.

Sources

Vespa — GitHub — accessed 2026-04-20
Vespa — docs — accessed 2026-04-20