LLM-native routing
OrangeRouter — LLM-native routing layer
A unified entry point for multi-model backends, with routing decisions made by LLM inference.
OrangeRouter is a lightweight piece of infrastructure: one external entry point, several model backends behind it, and an LLM in the middle that completes each routing decision before the request reaches a backend cluster — it understands the request first, then decides where it should go.
The aim is to let model capability, request semantics, and business policy jointly drive routing, rather than relying on static rule tables that are hard to maintain. This site covers the problem background, the technical design, and the tradeoffs behind it.
Design principles
03 / principles-
01
Decisions grounded in semantics
When requests carry conversation history, tool output, or vague intent, a static rule table falls behind quickly. Routing decisions should understand the request's semantics, not just match a single field.
-
02
A unified entry point
Callers need not track which provider or model tier sits behind each path. One endpoint, many backends, with a single decision step in between.
-
03
Transparent tradeoffs
Latency, cost, and observability are first-class concerns in the design. The relevant tradeoffs are stated explicitly on the architecture page, not buried as footnotes.
The request path
conceptual · not final// client → router → routing target
Not the same as…
03 / distinctions- API gateway The focus here is which model receives the request and why — not merely auth, rate limits, and forwarding.
- Load balancer Upstream lists spread load, but they don't parse the request body. OrangeRouter routes on the semantics a request carries.
- Simple prompt router Often a one-shot classification label. OrangeRouter targets decisions and fallbacks that span the full path and bind to policy.
Routing should understand the data. OrangeRouter places the routing decision in a single LLM inference, so semantics and policy drive the outcome. The motivation is on the background page; the implementation on the architecture page.