Creativity · Agent Protocol

Agent Router / Classifier Pattern

Not every request needs GPT-5 Opus. The router pattern uses a small, fast classifier (sometimes a fine-tuned Haiku, sometimes a simple logistic regression) to decide: is this an FAQ? A code question? A billing issue? Requests then route to the cheapest capable sub-agent. Classic production pattern — cuts cost 3–10x with barely perceptible quality impact when the router is tuned.

Protocol facts

Sponsor
Community pattern
Status
stable
Interop with
RouteLLM, Martian, OpenRouter, custom classifiers

Frequently asked questions

What makes a good router?

Low latency (add ≤100ms), high accuracy on the routing classes, and calibrated uncertainty so unknown-class requests can fall back to a strong model. Fine-tuned small LLMs or distilled classifiers both work.

What if the router misroutes?

Build in a cheap escalation: if the chosen sub-agent returns low confidence or an error, retry with the next-tier model. You pay for a bad route twice, but only on the misroute tail.

Does routing hurt quality?

Well-tuned routers show <2% quality degradation vs. always-Opus, measured on held-out eval sets, while cutting cost 3–10x. Untuned routers can degrade quality 10%+ — always eval.

Sources

  1. RouteLLM paper — accessed 2026-04-20
  2. RouteLLM repo — accessed 2026-04-20