Creativity · Agent Protocol

Agent Rate Limiting and Quotas

An agent with no rate limits can burn $50,000 in an afternoon via a runaway loop, or hammer a partner API into rate-limit purgatory. Good agent systems enforce quotas at multiple layers: per-tool calls/minute, per-session token budgets, per-user daily spend caps, and circuit breakers that halt the agent when thresholds trip. Non-negotiable for production.

Protocol facts

Sponsor
Production engineering community
Status
stable
Interop with
Redis, Envoy, Kong, Cloudflare, LiteLLM

Frequently asked questions

What should I rate-limit?

Per-tool call rate (protects partner APIs), per-session token count (protects budget), per-user daily spend (protects against abuse), and per-agent runtime (protects against infinite loops).

What's a circuit breaker for agents?

A runtime guard that halts the agent when a threshold is tripped — e.g., >100 tool calls in a session, >$50 in LLM spend, >10 retries on the same action. The agent surfaces the halt to a human rather than continuing silently.

Where do I enforce limits?

Multiple layers. The LLM gateway (LiteLLM, OpenRouter) caps tokens; the tool layer caps per-tool rate; the agent runtime caps per-session spend; your finance team caps per-org monthly. One layer is not enough.

Sources

  1. LiteLLM rate limits — accessed 2026-04-20
  2. Envoy rate limiting — accessed 2026-04-20