Background — OrangeRouter

Where it breaks down

03 / problems

Models keep diverging. They differ markedly in price, speed, and the tasks they excel at — and new ones arrive every quarter.
Requests carry semantics. A request is no longer just an HTTP path; it carries conversation, tool output, and instructions that a single static config cannot capture.
Policy changes faster than configuration. Business teams expect to adjust routing policy themselves, rather than file a ticket each week to rewrite rules.

What we mean by “LLM-only routing”

The routing decision itself is made by an LLM call — not a sidecar lightweight classifier, and not a hand-maintained rule tree. Its input and output form, and the boundaries around it, are detailed on the architecture page.

The gain is flexibility: semantic understanding, natural-language policies, and the dissolution of rule explosions. The cost is latency, token spend, and explainability — engineering constraints that are explicitly accounted for in the design, with the corresponding tradeoffs on the architecture page.

Gains

Semantic understanding, natural-language policy, and a sharp reduction in rule explosions when intent varies.

Costs

Added latency, real token cost, harder explainability, and stricter safety-boundary design.

Not the same as…

03 / distinctions

API gateway The focus here is which model receives the request — not merely auth and rate limits.
Load balancer Upstream lists can spread load, but they don't parse the semantics of the request body.
Simple prompt router Often a one-shot classification label. OrangeRouter targets full-path decisions bound to policy.

Where it fits

use cases

Route code generation to a large model and summarization to a lightweight one.
Describe a policy like “send 5% of traffic to the new backend first” in natural language.
Keep the client SDK unchanged while the backend swaps providers.

From rules to semantics. OrangeRouter replaces hard-to-maintain static rule tables with a single LLM inference: routing driven by semantic understanding and natural-language policy, leaving rule explosions to the model. The implementation is on the architecture page.

Why LLM routing

Where it breaks down

What we mean by “LLM-only routing”

Gains

Costs

Not the same as…

Where it fits