OrangeRouter — LLM-native routing

LLM-native routing

A unified entry point for multi-model backends, with routing decisions made by LLM inference.

OrangeRouter is a lightweight piece of infrastructure: one external entry point, several model backends behind it, and an LLM in the middle that completes each routing decision before the request reaches a backend cluster — it understands the request first, then decides where it should go.

The aim is to let model capability, request semantics, and business policy jointly drive routing, rather than relying on static rule tables that are hard to maintain. This site covers the problem background, the technical design, and the tradeoffs behind it.

Read the architecture Why LLM routing

Design principles

03 / principles

01
Decisions grounded in semantics

When requests carry conversation history, tool output, or vague intent, a static rule table falls behind quickly. Routing decisions should understand the request's semantics, not just match a single field.
02
A unified entry point

Callers need not track which provider or model tier sits behind each path. One endpoint, many backends, with a single decision step in between.
03
Transparent tradeoffs

Latency, cost, and observability are first-class concerns in the design. The relevant tradeoffs are stated explicitly on the architecture page, not buried as footnotes.

The request path

conceptual · not final

// client → router → routing target

Client SDK / app

OrangeRouter LLM decides

Backend A · large model

Backend B · lightweight model

Backend C · self-hosted

Not the same as…

03 / distinctions

API gateway The focus here is which model receives the request and why — not merely auth, rate limits, and forwarding.
Load balancer Upstream lists spread load, but they don't parse the request body. OrangeRouter routes on the semantics a request carries.
Simple prompt router Often a one-shot classification label. OrangeRouter targets decisions and fallbacks that span the full path and bind to policy.

Routing should understand the data. OrangeRouter places the routing decision in a single LLM inference, so semantics and policy drive the outcome. The motivation is on the background page; the implementation on the architecture page.

OrangeRouter — LLM-native routing layer

Design principles

Decisions grounded in semantics

A unified entry point

Transparent tradeoffs

The request path

Not the same as…