v0.1

Architecture

A unified ingress, an LLM decision core, and pluggable backend adapters — three cleanly decoupled layers that keep routing logic centralized and observable.

Design goals

04 / design goals

Layers

request path
┌─────────────┐
│   Client    │
└──────┬──────┘
       ▼
┌─────────────┐
│  Ingress    │  auth, rate limit, request normalization
└──────┬──────┘
       ▼
┌─────────────┐
│ LLM Router  │  assemble context, inject policy, select backend
└──────┬──────┘
       ▼
┌─────────────┐
│  Adapters   │  OpenAI protocol, Anthropic protocol, self-hosted, …
└─────────────┘

Core modules

structure
Module Responsibility Group
Normalizer Normalizes heterogeneous client request bodies into one internal format. ingress
Context builder Injects tenant policy, session summary, and hard constraints. policy
Decision engine Calls the LLM and returns a structured result (JSON or similar) with the target backend and reasoning. core
Registry Maintains the backend list, capability tags (fast, code, …), and health state. state
Observability Provides traces and records an audit entry per decision. audit

The life of a request

lifecycle
  1. Request reaches the ingress: auth, rate limit, and request normalization.
  2. Normalizer and context builder prepare the input the decision will consume.
  3. The LLM returns a routing decision: target backend, an alternative, or a rejection reason.
  4. The adapter forwards to the chosen backend, in its native protocol.
  5. The response returns: response-phase secondary routing is a later milestone.

Safety & governance

non-negotiables

Roadmap

stated plainly
Phase Focus
Now A closed loop for the core routing path, with the design tradeoffs and baselines settling out.
Near term An end-to-end path, observability and evaluation, and replay of bad decisions.
Longer term Open source and pilot rollouts — building a multi-model routing ecosystem.

For the reasoning behind these choices, see the background page. Questions or dissenting views are welcome — reach the team.