Settings
The Settings page consolidates system-wide configuration into a single form. Changes take effect after clicking Save Settings.
Rate Limiting
Global rate limits protect the gateway and upstream providers from excessive traffic. Three limits can be configured:
| Setting | Default | Description |
|---|---|---|
| Requests per Minute | 60 | Maximum requests per minute across all clients |
| Requests per Hour | 1,000 | Maximum requests per hour |
| Tokens per Minute | 100,000 | Maximum total tokens processed per minute |
These are system-wide defaults. Per-provider and per-API-key rate limits (configured elsewhere) override these values for their respective scopes.
Caching
The gateway includes a semantic cache that stores and reuses responses for similar prompts, reducing latency and cost.
| Setting | Default | Description |
|---|---|---|
| Enabled | Yes | Toggle semantic caching on or off |
| TTL (seconds) | 3,600 | How long cached responses remain valid |
| Similarity Threshold | 0.95 | Minimum cosine similarity for a cache hit (0.0 to 1.0) |
A higher similarity threshold (closer to 1.0) requires prompts to be nearly identical for a cache hit. A lower threshold allows more variation but increases the risk of returning a response to a slightly different question.
Default Routing Strategy
Select the system-wide default routing strategy. This applies to any request that does not match a specific routing rule on the Routing page.
Available strategies:
- Priority
- Round Robin
- Weighted
- Least Latency
- Least Cost
- Free Tier First
- Task Optimized
- Cost Optimized
- Failover
See the Routing Configuration page for detailed descriptions of each strategy.
Allowed Providers
Select which provider types are available for use in the gateway. Unchecked providers cannot be added or activated. The full list of available provider types:
openai, anthropic, google, azure, bedrock, cohere, mistral, deepseek, groq, together-ai, fireworks, xai, cerebras, sambanova, replicate, elevenlabs, stability
Leave all checked to allow any provider type.
Allowed Models
Restrict which model IDs can be used through the gateway. Enter one model ID per line (e.g. gpt-4, claude-3-opus). Leave the field empty to allow all models.
This acts as an allowlist — only models listed here will be routable. Combined with per-provider model registrations, this gives you two layers of control over which models are accessible.
SSO Configuration
Single Sign-On can be enabled and configured directly from the Settings page. Toggle Enable Single Sign-On and select a provider:
OIDC (OpenID Connect)
| Field | Description |
|---|---|
| Issuer URL | The OIDC discovery endpoint (e.g. https://accounts.google.com) |
| Client ID | OAuth client identifier |
| Client Secret | OAuth client secret (leave blank to keep the current value) |
| Scopes | Space-separated OIDC scopes (default: openid email profile) |
SAML 2.0
| Field | Description |
|---|---|
| IdP SSO URL | The identity provider’s single sign-on entry point |
| IdP Entity ID | The issuer identifier from your identity provider |
| IdP Signing Certificate | The X.509 certificate in PEM format used to verify SAML assertions |
SIEM Integration
Export audit logs to an external Security Information and Event Management system. Select a SIEM type and configure the connection:
| SIEM Type | Endpoint Example |
|---|---|
| Splunk HEC | https://splunk:8088/services/collector |
| Elasticsearch / ELK | https://elasticsearch:9200 |
| Webhook | https://webhook.example.com/audit |
Additional fields:
- Auth Token — Authentication credential for the SIEM endpoint
- Batch Size — Number of events per batch (1 to 1,000; default: 100)
Use the Test Connection button to verify connectivity before saving. The test result displays inline as “Connection successful” or “Connection failed.”
Saving Changes
All settings on this page are saved atomically via PUT /api/admin/settings. The Save Settings button is disabled while a save is in progress to prevent duplicate submissions.