Skip to content

Budgets

Budgets let you set spending and usage limits at multiple levels of your organization hierarchy. When a budget is exceeded, the gateway automatically blocks further requests for that scope until the period resets.

Budget List

The Budgets page displays a paginated table with these columns:

ColumnDescription
ScopeThe scope type and name (e.g. “tenant / Acme Corp”)
LimitsCost, request, and token limits for the period
Current UsageA progress bar showing consumption as a percentage of the cost limit
Statusactive, warning, exceeded, or inactive

Creating a Budget

Click Create Budget to open the form:

Scope Type

Budgets can be scoped to any level in the multi-tenancy hierarchy:

Scope TypeDescription
TenantApplies to an entire tenant and all its users
OrganizationApplies to an organization within a tenant
DepartmentApplies to a department within an organization
UserApplies to a single user
API KeyApplies to a specific API key

Period

Choose how frequently the budget resets:

PeriodReset Cadence
HourlyEvery hour on the hour
DailyMidnight UTC each day
WeeklyMidnight UTC each Monday
MonthlyFirst of each month at midnight UTC

Limits

You can set one or more limits per budget. All limits are optional — set only the ones relevant to your use case:

  • Cost Limit ($) — Maximum dollar spend for the period
  • Request Limit — Maximum number of requests
  • Token Limit — Maximum total tokens (input + output combined)

Alert Threshold

The Alert Threshold slider (range: 50% to 100%, default: 80%) determines when the budget transitions to warning status. When usage reaches this percentage of any limit, the gateway marks the budget as warning and can trigger notification alerts.

Budget Statuses

StatusMeaning
activeBudget is tracking usage; limits not yet reached
warningUsage has crossed the alert threshold
exceededOne or more limits have been reached; requests are blocked
inactiveBudget is disabled and not enforcing limits

Automatic Enforcement

When a budget enters the exceeded state, the gateway returns a 429 Too Many Requests response for any further requests from that scope. The response includes a Retry-After header indicating when the budget period resets.

Enforcement is hierarchical — if a tenant budget is exceeded, all users, departments, and organizations within that tenant are blocked, regardless of their individual budget status.

Editing and Deleting Budgets

  • Click Edit to modify limits, period, scope, or alert threshold.
  • Click Delete to remove a budget. A confirmation dialog warns that usage tracking for that scope will stop. Deletion does not affect historical usage data.

Best Practices

  • Set tenant-level budgets as overall spending caps, then use department or user budgets for finer-grained control.
  • Use the hourly period for rate-sensitive workloads that could spike unexpectedly.
  • Set alert thresholds to 80% to give teams time to react before hitting hard limits.
  • Combine cost limits with token limits for defense in depth — a low-cost model can still consume excessive tokens.