Capability · Framework — orchestration

LiteLLM

LiteLLM has become the default way to talk to many model providers from one codebase. You write OpenAI-format calls and LiteLLM translates them to Anthropic, Bedrock, Vertex, Cohere, Mistral, or any of ~100 providers — with built-in retries, fallbacks, budget caps, rate limiting, and a production proxy that teams use as their internal AI gateway.

Framework facts

Category: orchestration
Language: Python
License: MIT + Enterprise
Repository: https://github.com/BerriAI/litellm

Install

pip install litellm
# Proxy server:
pip install 'litellm[proxy]'

Quickstart

from litellm import completion
import os

os.environ['ANTHROPIC_API_KEY'] = '...'

response = completion(
    model='anthropic/claude-opus-4-7',
    messages=[{'role': 'user', 'content': 'Hello'}],
    fallbacks=['openai/gpt-4o', 'bedrock/claude-3-haiku']
)
print(response.choices[0].message.content)

Alternatives

OpenRouter — hosted router as a service
Portkey — AI gateway with guardrails and caching
Helicone — observability-first gateway
Vercel AI SDK — TypeScript-first provider abstraction

Frequently asked questions

SDK or proxy — which should I use?

Use the SDK when you're writing application code in Python and want provider flexibility. Deploy the proxy when multiple services or non-Python apps need a central AI gateway with unified keys, budgets, and logging.

Does LiteLLM add latency?

The Python SDK is a thin translation layer — usually <5ms overhead. The proxy adds one hop but gains caching, retries, and observability that typically save more latency than they cost.

Sources

LiteLLM — docs — accessed 2026-04-20
LiteLLM Proxy Server — accessed 2026-04-20