Settings

The Settings page consolidates system-wide configuration into a single form. Changes take effect after clicking Save Settings.

Rate Limiting

Global rate limits protect the gateway and upstream providers from excessive traffic. Three limits can be configured:

Setting	Default	Description
Requests per Minute	60	Maximum requests per minute across all clients
Requests per Hour	1,000	Maximum requests per hour
Tokens per Minute	100,000	Maximum total tokens processed per minute

These are system-wide defaults. Per-provider and per-API-key rate limits (configured elsewhere) override these values for their respective scopes.

Caching

The gateway includes a semantic cache that stores and reuses responses for similar prompts, reducing latency and cost.

Setting	Default	Description
Enabled	Yes	Toggle semantic caching on or off
TTL (seconds)	3,600	How long cached responses remain valid
Similarity Threshold	0.95	Minimum cosine similarity for a cache hit (0.0 to 1.0)

A higher similarity threshold (closer to 1.0) requires prompts to be nearly identical for a cache hit. A lower threshold allows more variation but increases the risk of returning a response to a slightly different question.

Default Routing Strategy

Select the system-wide default routing strategy. This applies to any request that does not match a specific routing rule on the Routing page.

Available strategies:

Priority
Round Robin
Weighted
Least Latency
Least Cost
Free Tier First
Task Optimized
Cost Optimized
Failover

See the Routing Configuration page for detailed descriptions of each strategy.

Allowed Providers

Select which provider types are available for use in the gateway. Unchecked providers cannot be added or activated. The full list of available provider types:

openai, anthropic, google, azure, bedrock, cohere, mistral, deepseek, groq, together-ai, fireworks, xai, cerebras, sambanova, replicate, elevenlabs, stability

Leave all checked to allow any provider type.

Allowed Models

Restrict which model IDs can be used through the gateway. Enter one model ID per line (e.g. gpt-4, claude-3-opus). Leave the field empty to allow all models.

This acts as an allowlist — only models listed here will be routable. Combined with per-provider model registrations, this gives you two layers of control over which models are accessible.

SSO Configuration

Single Sign-On can be enabled and configured directly from the Settings page. Toggle Enable Single Sign-On and select a provider:

OIDC (OpenID Connect)

Field	Description
Issuer URL	The OIDC discovery endpoint (e.g. `https://accounts.google.com`)
Client ID	OAuth client identifier
Client Secret	OAuth client secret (leave blank to keep the current value)
Scopes	Space-separated OIDC scopes (default: `openid email profile`)

SAML 2.0

Field	Description
IdP SSO URL	The identity provider’s single sign-on entry point
IdP Entity ID	The issuer identifier from your identity provider
IdP Signing Certificate	The X.509 certificate in PEM format used to verify SAML assertions

SIEM Integration

Export audit logs to an external Security Information and Event Management system. Select a SIEM type and configure the connection:

SIEM Type	Endpoint Example
Splunk HEC	`https://splunk:8088/services/collector`
Elasticsearch / ELK	`https://elasticsearch:9200`
Webhook	`https://webhook.example.com/audit`

Additional fields:

Auth Token — Authentication credential for the SIEM endpoint
Batch Size — Number of events per batch (1 to 1,000; default: 100)

Use the Test Connection button to verify connectivity before saving. The test result displays inline as “Connection successful” or “Connection failed.”

Saving Changes

All settings on this page are saved atomically via PUT /api/admin/settings. The Save Settings button is disabled while a save is in progress to prevent duplicate submissions.