Creativity · Agent Protocol
Agent Router / Classifier Pattern
Not every request needs GPT-5 Opus. The router pattern uses a small, fast classifier (sometimes a fine-tuned Haiku, sometimes a simple logistic regression) to decide: is this an FAQ? A code question? A billing issue? Requests then route to the cheapest capable sub-agent. Classic production pattern — cuts cost 3–10x with barely perceptible quality impact when the router is tuned.
Protocol facts
- Sponsor
- Community pattern
- Status
- stable
- Interop with
- RouteLLM, Martian, OpenRouter, custom classifiers
Frequently asked questions
What makes a good router?
Low latency (add ≤100ms), high accuracy on the routing classes, and calibrated uncertainty so unknown-class requests can fall back to a strong model. Fine-tuned small LLMs or distilled classifiers both work.
What if the router misroutes?
Build in a cheap escalation: if the chosen sub-agent returns low confidence or an error, retry with the next-tier model. You pay for a bad route twice, but only on the misroute tail.
Does routing hurt quality?
Well-tuned routers show <2% quality degradation vs. always-Opus, measured on held-out eval sets, while cutting cost 3–10x. Untuned routers can degrade quality 10%+ — always eval.
Sources
- RouteLLM paper — accessed 2026-04-20
- RouteLLM repo — accessed 2026-04-20