Skip to content

Settings

The Settings page consolidates system-wide configuration into a single form. Changes take effect after clicking Save Settings.

Rate Limiting

Global rate limits protect the gateway and upstream providers from excessive traffic. Three limits can be configured:

SettingDefaultDescription
Requests per Minute60Maximum requests per minute across all clients
Requests per Hour1,000Maximum requests per hour
Tokens per Minute100,000Maximum total tokens processed per minute

These are system-wide defaults. Per-provider and per-API-key rate limits (configured elsewhere) override these values for their respective scopes.

Caching

The gateway includes a semantic cache that stores and reuses responses for similar prompts, reducing latency and cost.

SettingDefaultDescription
EnabledYesToggle semantic caching on or off
TTL (seconds)3,600How long cached responses remain valid
Similarity Threshold0.95Minimum cosine similarity for a cache hit (0.0 to 1.0)

A higher similarity threshold (closer to 1.0) requires prompts to be nearly identical for a cache hit. A lower threshold allows more variation but increases the risk of returning a response to a slightly different question.

Default Routing Strategy

Select the system-wide default routing strategy. This applies to any request that does not match a specific routing rule on the Routing page.

Available strategies:

  • Priority
  • Round Robin
  • Weighted
  • Least Latency
  • Least Cost
  • Free Tier First
  • Task Optimized
  • Cost Optimized
  • Failover

See the Routing Configuration page for detailed descriptions of each strategy.

Allowed Providers

Select which provider types are available for use in the gateway. Unchecked providers cannot be added or activated. The full list of available provider types:

openai, anthropic, google, azure, bedrock, cohere, mistral, deepseek, groq, together-ai, fireworks, xai, cerebras, sambanova, replicate, elevenlabs, stability

Leave all checked to allow any provider type.

Allowed Models

Restrict which model IDs can be used through the gateway. Enter one model ID per line (e.g. gpt-4, claude-3-opus). Leave the field empty to allow all models.

This acts as an allowlist — only models listed here will be routable. Combined with per-provider model registrations, this gives you two layers of control over which models are accessible.

SSO Configuration

Single Sign-On can be enabled and configured directly from the Settings page. Toggle Enable Single Sign-On and select a provider:

OIDC (OpenID Connect)

FieldDescription
Issuer URLThe OIDC discovery endpoint (e.g. https://accounts.google.com)
Client IDOAuth client identifier
Client SecretOAuth client secret (leave blank to keep the current value)
ScopesSpace-separated OIDC scopes (default: openid email profile)

SAML 2.0

FieldDescription
IdP SSO URLThe identity provider’s single sign-on entry point
IdP Entity IDThe issuer identifier from your identity provider
IdP Signing CertificateThe X.509 certificate in PEM format used to verify SAML assertions

SIEM Integration

Export audit logs to an external Security Information and Event Management system. Select a SIEM type and configure the connection:

SIEM TypeEndpoint Example
Splunk HEChttps://splunk:8088/services/collector
Elasticsearch / ELKhttps://elasticsearch:9200
Webhookhttps://webhook.example.com/audit

Additional fields:

  • Auth Token — Authentication credential for the SIEM endpoint
  • Batch Size — Number of events per batch (1 to 1,000; default: 100)

Use the Test Connection button to verify connectivity before saving. The test result displays inline as “Connection successful” or “Connection failed.”

Saving Changes

All settings on this page are saved atomically via PUT /api/admin/settings. The Save Settings button is disabled while a save is in progress to prevent duplicate submissions.