Creativity · Agent Protocol

Agent Router / Classifier Pattern

Not every request needs GPT-5 Opus. The router pattern uses a small, fast classifier (sometimes a fine-tuned Haiku, sometimes a simple logistic regression) to decide: is this an FAQ? A code question? A billing issue? Requests then route to the cheapest capable sub-agent. Classic production pattern — cuts cost 3–10x with barely perceptible quality impact when the router is tuned.

Protocol facts

Sponsor: Community pattern
Status: stable
Interop with: RouteLLM, Martian, OpenRouter, custom classifiers

Frequently asked questions

What makes a good router?

Low latency (add ≤100ms), high accuracy on the routing classes, and calibrated uncertainty so unknown-class requests can fall back to a strong model. Fine-tuned small LLMs or distilled classifiers both work.

What if the router misroutes?

Build in a cheap escalation: if the chosen sub-agent returns low confidence or an error, retry with the next-tier model. You pay for a bad route twice, but only on the misroute tail.

Does routing hurt quality?

Well-tuned routers show <2% quality degradation vs. always-Opus, measured on held-out eval sets, while cutting cost 3–10x. Untuned routers can degrade quality 10%+ — always eval.

Sources

RouteLLM paper — accessed 2026-04-20
RouteLLM repo — accessed 2026-04-20

Protocol facts

Frequently asked questions

Sources

Related