Capability · Framework — rag

turbopuffer

turbopuffer stores indexes directly on object storage and spins up stateless query workers on demand. That gives you near-zero idle cost for long-tail namespaces while still serving ANN queries in hundreds of milliseconds. It's a popular backend for per-user RAG and multi-tenant search.

Framework facts

Category
rag
Language
Rust (Python/JS SDKs)
License
Proprietary (managed service)
Repository
https://turbopuffer.com/docs

Install

pip install turbopuffer
# or
npm install @turbopuffer/turbopuffer

Quickstart

import turbopuffer as tpuf

tpuf.api_key = 'tpuf_...'
ns = tpuf.Namespace('user-42')
ns.write(upserts=[{'id': 1, 'vector': [0.1]*1536, 'attributes': {'text': 'hi'}}])
results = ns.query(vector=[0.1]*1536, top_k=5)

Alternatives

  • Pinecone serverless — similar pay-per-query model
  • MongoDB Atlas Vector Search
  • Qdrant Cloud
  • LanceDB with S3 backing

Frequently asked questions

What is turbopuffer's sweet spot?

Large numbers of small-to-medium namespaces — e.g. a personal RAG index per user. Idle namespaces cost almost nothing because the data lives on object storage.

Can I self-host turbopuffer?

No. It's a managed service. For self-hosting, consider LanceDB on S3, Qdrant, or pgvector.

Sources

  1. turbopuffer — docs — accessed 2026-04-20
  2. turbopuffer — architecture — accessed 2026-04-20