Capability · Framework — orchestration
LiteLLM
LiteLLM has become the default way to talk to many model providers from one codebase. You write OpenAI-format calls and LiteLLM translates them to Anthropic, Bedrock, Vertex, Cohere, Mistral, or any of ~100 providers — with built-in retries, fallbacks, budget caps, rate limiting, and a production proxy that teams use as their internal AI gateway.
Framework facts
- Category
- orchestration
- Language
- Python
- License
- MIT + Enterprise
- Repository
- https://github.com/BerriAI/litellm
Install
pip install litellm
# Proxy server:
pip install 'litellm[proxy]' Quickstart
from litellm import completion
import os
os.environ['ANTHROPIC_API_KEY'] = '...'
response = completion(
model='anthropic/claude-opus-4-7',
messages=[{'role': 'user', 'content': 'Hello'}],
fallbacks=['openai/gpt-4o', 'bedrock/claude-3-haiku']
)
print(response.choices[0].message.content) Alternatives
- OpenRouter — hosted router as a service
- Portkey — AI gateway with guardrails and caching
- Helicone — observability-first gateway
- Vercel AI SDK — TypeScript-first provider abstraction
Frequently asked questions
SDK or proxy — which should I use?
Use the SDK when you're writing application code in Python and want provider flexibility. Deploy the proxy when multiple services or non-Python apps need a central AI gateway with unified keys, budgets, and logging.
Does LiteLLM add latency?
The Python SDK is a thin translation layer — usually <5ms overhead. The proxy adds one hop but gains caching, retries, and observability that typically save more latency than they cost.
Sources
- LiteLLM — docs — accessed 2026-04-20
- LiteLLM Proxy Server — accessed 2026-04-20