Creativity · Agent Protocol
Agent Rate Limiting and Quotas
An agent with no rate limits can burn $50,000 in an afternoon via a runaway loop, or hammer a partner API into rate-limit purgatory. Good agent systems enforce quotas at multiple layers: per-tool calls/minute, per-session token budgets, per-user daily spend caps, and circuit breakers that halt the agent when thresholds trip. Non-negotiable for production.
Protocol facts
- Sponsor
- Production engineering community
- Status
- stable
- Interop with
- Redis, Envoy, Kong, Cloudflare, LiteLLM
Frequently asked questions
What should I rate-limit?
Per-tool call rate (protects partner APIs), per-session token count (protects budget), per-user daily spend (protects against abuse), and per-agent runtime (protects against infinite loops).
What's a circuit breaker for agents?
A runtime guard that halts the agent when a threshold is tripped — e.g., >100 tool calls in a session, >$50 in LLM spend, >10 retries on the same action. The agent surfaces the halt to a human rather than continuing silently.
Where do I enforce limits?
Multiple layers. The LLM gateway (LiteLLM, OpenRouter) caps tokens; the tool layer caps per-tool rate; the agent runtime caps per-session spend; your finance team caps per-org monthly. One layer is not enough.
Sources
- LiteLLM rate limits — accessed 2026-04-20
- Envoy rate limiting — accessed 2026-04-20