Architecture
A unified ingress, an LLM decision core, and pluggable backend adapters — three cleanly decoupled layers that keep routing logic centralized and observable.
Design goals
04 / design goals- Unified ingress: the application talks to a single entry point; backends talk to each model.
- Structured results: the LLM step returns the target backend, an alternative, or a rejection reason — not free-form prose.
- Adapter layer: speaks each provider's API, or self-hosted inference.
- Explainable logs: able to answer “why was this request routed this way?”
Layers
request path┌─────────────┐
│ Client │
└──────┬──────┘
▼
┌─────────────┐
│ Ingress │ auth, rate limit, request normalization
└──────┬──────┘
▼
┌─────────────┐
│ LLM Router │ assemble context, inject policy, select backend
└──────┬──────┘
▼
┌─────────────┐
│ Adapters │ OpenAI protocol, Anthropic protocol, self-hosted, …
└─────────────┘
Core modules
structure| Module | Responsibility | Group |
|---|---|---|
| Normalizer | Normalizes heterogeneous client request bodies into one internal format. | ingress |
| Context builder | Injects tenant policy, session summary, and hard constraints. | policy |
| Decision engine | Calls the LLM and returns a structured result (JSON or similar) with the target backend and reasoning. | core |
| Registry | Maintains the backend list, capability tags (fast, code, …), and health state. | state |
| Observability | Provides traces and records an audit entry per decision. | audit |
The life of a request
lifecycle- Request reaches the ingress: auth, rate limit, and request normalization.
- Normalizer and context builder prepare the input the decision will consume.
- The LLM returns a routing decision: target backend, an alternative, or a rejection reason.
- The adapter forwards to the chosen backend, in its native protocol.
- The response returns: response-phase secondary routing is a later milestone.
Safety & governance
non-negotiables- Policy sandbox. The model may only select from an allowed backend set — never an arbitrary endpoint.
- Strip or redact fields before they enter the decision prompt.
- Fallback on failure. When the LLM step errors, fall back to a preconfigured default backend — not uncontrolled routing.
Roadmap
stated plainly| Phase | Focus |
|---|---|
| Now | A closed loop for the core routing path, with the design tradeoffs and baselines settling out. |
| Near term | An end-to-end path, observability and evaluation, and replay of bad decisions. |
| Longer term | Open source and pilot rollouts — building a multi-model routing ecosystem. |
For the reasoning behind these choices, see the background page. Questions or dissenting views are welcome — reach the team.