Background & motivation

Why LLM routing

Most teams start with round-robin or weighted rules — fine at first, but the limits show up quickly.

Where it breaks down

03 / problems

What we mean by “LLM-only routing”

The routing decision itself is made by an LLM call — not a sidecar lightweight classifier, and not a hand-maintained rule tree. Its input and output form, and the boundaries around it, are detailed on the architecture page.

The gain is flexibility: semantic understanding, natural-language policies, and the dissolution of rule explosions. The cost is latency, token spend, and explainability — engineering constraints that are explicitly accounted for in the design, with the corresponding tradeoffs on the architecture page.

Gains

Semantic understanding, natural-language policy, and a sharp reduction in rule explosions when intent varies.

Costs

Added latency, real token cost, harder explainability, and stricter safety-boundary design.

Not the same as…

03 / distinctions

Where it fits

use cases

From rules to semantics. OrangeRouter replaces hard-to-maintain static rule tables with a single LLM inference: routing driven by semantic understanding and natural-language policy, leaving rule explosions to the model. The implementation is on the architecture page.