Chat Completions
Create a chat completion using any configured provider through a single OpenAI-compatible endpoint.
POST /v1/chat/completionsRequired capability: chat
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model identifier (e.g. gpt-4, claude-3-opus, gemini-pro). The gateway routes to the appropriate provider. |
messages | array | Yes | Array of message objects (minimum 1). See Message Format. |
temperature | number | No | Sampling temperature between 0 and 2. Higher values produce more random output. |
top_p | number | No | Nucleus sampling parameter between 0 and 1. |
max_tokens | integer | No | Maximum number of tokens to generate. |
max_completion_tokens | integer | No | Alternative to max_tokens. Maximum completion tokens to generate. |
stream | boolean | No | If true, responses are streamed as server-sent events. Defaults to false. |
stop | string | string[] | No | Sequences where the model stops generating. |
n | integer | No | Number of completions to generate (1-10). |
tools | array | No | List of tool/function definitions the model may call. |
tool_choice | string | object | No | Controls tool calling: "auto", "none", "required", or a specific function. |
response_format | object | No | Output format: {"type": "text"}, {"type": "json_object"}, or {"type": "json_schema", "json_schema": {...}}. |
seed | integer | No | Seed for deterministic sampling (best-effort). |
user | string | No | End-user identifier for abuse tracking. |
Message Format
Each message object has the following structure:
| Field | Type | Required | Description |
|---|---|---|---|
role | string | Yes | One of system, user, assistant, or tool. |
content | string | array | null | Yes | The message content. Can be a string, an array of content parts (text or image_url), or null for assistant messages with tool calls. |
name | string | No | Name of the message author. |
tool_calls | array | No | Tool calls made by the assistant. Each has id, type: "function", and function: {name, arguments}. |
tool_call_id | string | No | For tool role messages, the ID of the tool call being responded to. |
Multimodal Content
The content field can be an array of content parts for vision requests:
[ { "type": "text", "text": "What is in this image?" }, { "type": "image_url", "image_url": { "url": "https://example.com/image.png", "detail": "auto" } }]The detail field accepts "auto", "low", or "high".
Response
{ "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1709000000, "model": "gpt-4", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Hello! How can I help you today?" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 12, "completion_tokens": 9, "total_tokens": 21 }}Streaming
Set "stream": true to receive responses as server-sent events (SSE). The response uses Content-Type: text/event-stream.
Each event is a JSON object prefixed with data: :
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}
data: [DONE]The stream terminates with data: [DONE].
Headers
The response includes:
| Header | Description |
|---|---|
X-Request-ID | Unique identifier for the request, useful for debugging and audit trails. |
Example
curl https://your-gateway.example.com/v1/chat/completions \ -H "Authorization: Bearer aigw_sk_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain quantum computing in one sentence."} ], "temperature": 0.7, "max_tokens": 100 }'Streaming Example
curl https://your-gateway.example.com/v1/chat/completions \ -H "Authorization: Bearer aigw_sk_your_api_key" \ -H "Content-Type: application/json" \ -N \ -d '{ "model": "gpt-4", "messages": [{"role": "user", "content": "Write a haiku about APIs."}], "stream": true }'Tool Calling Example
curl https://your-gateway.example.com/v1/chat/completions \ -H "Authorization: Bearer aigw_sk_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4", "messages": [{"role": "user", "content": "What is the weather in London?"}], "tools": [ { "type": "function", "function": { "name": "get_weather", "description": "Get the current weather for a location", "parameters": { "type": "object", "properties": { "location": {"type": "string"} }, "required": ["location"] } } } ], "tool_choice": "auto" }'Gateway Features
The chat completions endpoint passes through the full gateway middleware pipeline:
- Request normalization — requests are translated to a unified internal format, enabling routing to any provider.
- RAG injection — if RAG is configured for the tenant, relevant documents are injected into the conversation context.
- Prompt guards — configurable content filters for injection detection, PII, and toxicity.
- Budget checks — requests are blocked if the tenant or API key budget has been exceeded.
- Semantic cache — identical or semantically similar requests may be served from cache.
- Usage tracking — token usage is recorded for billing and analytics.
- Audit logging — requests are logged for compliance.