Skip to content

Create Routing Rules

Routing rules determine which providers handle incoming requests and in what order. This tutorial walks you through creating a routing configuration using the priority strategy, then testing it with an API call.

Prerequisites

Step 1 — Navigate to the Routing Page

  1. In the sidebar, click Routing.
  2. The routing list shows all routing configurations for your tenant, grouped by capability (chat, embeddings, audio, images, etc.).
  3. Click Create Routing Config.

Step 2 — Set Basic Configuration

  1. Name — Enter a descriptive name, for example Chat Primary Route.
  2. Capability — Select the request type this config handles. Choose chat for chat completion requests.
  3. Strategy — Select priority to start. This strategy tries providers in a fixed order, falling back to the next on failure. See Routing Strategies for all ten options.
  4. Enabled — Toggle on.
  5. Is Default — Toggle on if this should be the fallback config when no other routing config matches.

Step 3 — Add Provider Entries

Each route entry maps a provider to this routing configuration.

  1. Click Add Route.
  2. Provider — Select your provider from the dropdown (e.g., openai-prod).
  3. Model ID — Enter the default model for this route, for example gpt-4o.
  4. Priority — Enter 1 (lower number = higher priority). This provider will be tried first.
  5. Weight — Enter 1. Weight is used by the weighted strategy; for priority strategy it has no effect.
  6. Enabled — Toggle on.

To add a fallback provider:

  1. Click Add Route again.
  2. Select a second provider (e.g., anthropic-prod).
  3. Set Priority to 2. This provider is tried only if the first fails.
  4. Set Model ID to the equivalent model on this provider, for example claude-sonnet-4-20250514.

Step 4 — Configure the Fallback Chain (Optional)

The fallback chain provides a last-resort option after all route entries have been exhausted.

  1. Scroll to the Fallback Chain section.
  2. Add a provider entry with a model ID. This provider is appended after all strategy-ordered routes.
  3. You can also enable a Local Fallback (e.g., an Ollama instance) for complete offline resilience.

The routing service detects circular fallback chains and breaks them automatically.

Step 5 — Set Capability Filters

If a specific model is requested by the client (via the model field in the API request), the routing service filters route entries to only include providers that have a matching ModelConfig with that model ID enabled. Providers without the requested model are skipped.

To ensure correct filtering:

  1. Verify each provider in your routes has the relevant models registered in the Models section.
  2. Models must be marked as enabled to be eligible.

Step 6 — Test with an API Call

Send a test request to verify routing works:

Terminal window
curl -X POST http://localhost:3000/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello"}]
}'

The response includes a metadata.providerChain array showing which providers were attempted:

{
"metadata": {
"providerChain": [
{
"provider": "openai-prod",
"model": "gpt-4o",
"attempted": true,
"result": "success",
"latencyMs": 842
}
]
}
}

If the first provider fails, you will see multiple entries in the chain showing the failover.

How the Routing Service Works

  1. Load config — Finds the routing config matching the tenant, capability, and enabled state.
  2. Load providers — Fetches all ProviderConfig documents referenced in the routes.
  3. Filter — Removes providers that are disabled, unhealthy, or not in the tenant’s allowed-providers list.
  4. Apply strategy — Orders the remaining candidates using the selected strategy.
  5. Append fallbacks — Adds fallback chain entries after the strategy-ordered list.
  6. Cache — Results are cached in memory for 60 seconds with jittered TTL to prevent thundering herd.

Next Steps