Background & motivation
Why LLM routing
Most teams start with round-robin or weighted rules — fine at first, but the limits show up quickly.
Where it breaks down
03 / problems- Models keep diverging. They differ markedly in price, speed, and the tasks they excel at — and new ones arrive every quarter.
- Requests carry semantics. A request is no longer just an HTTP path; it carries conversation, tool output, and instructions that a single static config cannot capture.
- Policy changes faster than configuration. Business teams expect to adjust routing policy themselves, rather than file a ticket each week to rewrite rules.
What we mean by “LLM-only routing”
The routing decision itself is made by an LLM call — not a sidecar lightweight classifier, and not a hand-maintained rule tree. Its input and output form, and the boundaries around it, are detailed on the architecture page.
The gain is flexibility: semantic understanding, natural-language policies, and the dissolution of rule explosions. The cost is latency, token spend, and explainability — engineering constraints that are explicitly accounted for in the design, with the corresponding tradeoffs on the architecture page.
Gains
Semantic understanding, natural-language policy, and a sharp reduction in rule explosions when intent varies.
Costs
Added latency, real token cost, harder explainability, and stricter safety-boundary design.
Not the same as…
03 / distinctions- API gateway The focus here is which model receives the request — not merely auth and rate limits.
- Load balancer Upstream lists can spread load, but they don't parse the semantics of the request body.
- Simple prompt router Often a one-shot classification label. OrangeRouter targets full-path decisions bound to policy.
Where it fits
use cases- Route code generation to a large model and summarization to a lightweight one.
- Describe a policy like “send 5% of traffic to the new backend first” in natural language.
- Keep the client SDK unchanged while the backend swaps providers.
From rules to semantics. OrangeRouter replaces hard-to-maintain static rule tables with a single LLM inference: routing driven by semantic understanding and natural-language policy, leaving rule explosions to the model. The implementation is on the architecture page.