Skip to content

Chat Completions

Create a chat completion using any configured provider through a single OpenAI-compatible endpoint.

POST /v1/chat/completions

Required capability: chat

Request Body

ParameterTypeRequiredDescription
modelstringYesModel identifier (e.g. gpt-4, claude-3-opus, gemini-pro). The gateway routes to the appropriate provider.
messagesarrayYesArray of message objects (minimum 1). See Message Format.
temperaturenumberNoSampling temperature between 0 and 2. Higher values produce more random output.
top_pnumberNoNucleus sampling parameter between 0 and 1.
max_tokensintegerNoMaximum number of tokens to generate.
max_completion_tokensintegerNoAlternative to max_tokens. Maximum completion tokens to generate.
streambooleanNoIf true, responses are streamed as server-sent events. Defaults to false.
stopstring | string[]NoSequences where the model stops generating.
nintegerNoNumber of completions to generate (1-10).
toolsarrayNoList of tool/function definitions the model may call.
tool_choicestring | objectNoControls tool calling: "auto", "none", "required", or a specific function.
response_formatobjectNoOutput format: {"type": "text"}, {"type": "json_object"}, or {"type": "json_schema", "json_schema": {...}}.
seedintegerNoSeed for deterministic sampling (best-effort).
userstringNoEnd-user identifier for abuse tracking.

Message Format

Each message object has the following structure:

FieldTypeRequiredDescription
rolestringYesOne of system, user, assistant, or tool.
contentstring | array | nullYesThe message content. Can be a string, an array of content parts (text or image_url), or null for assistant messages with tool calls.
namestringNoName of the message author.
tool_callsarrayNoTool calls made by the assistant. Each has id, type: "function", and function: {name, arguments}.
tool_call_idstringNoFor tool role messages, the ID of the tool call being responded to.

Multimodal Content

The content field can be an array of content parts for vision requests:

[
{ "type": "text", "text": "What is in this image?" },
{ "type": "image_url", "image_url": { "url": "https://example.com/image.png", "detail": "auto" } }
]

The detail field accepts "auto", "low", or "high".

Response

{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1709000000,
"model": "gpt-4",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 9,
"total_tokens": 21
}
}

Streaming

Set "stream": true to receive responses as server-sent events (SSE). The response uses Content-Type: text/event-stream.

Each event is a JSON object prefixed with data: :

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}
data: [DONE]

The stream terminates with data: [DONE].

Headers

The response includes:

HeaderDescription
X-Request-IDUnique identifier for the request, useful for debugging and audit trails.

Example

Terminal window
curl https://your-gateway.example.com/v1/chat/completions \
-H "Authorization: Bearer aigw_sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in one sentence."}
],
"temperature": 0.7,
"max_tokens": 100
}'

Streaming Example

Terminal window
curl https://your-gateway.example.com/v1/chat/completions \
-H "Authorization: Bearer aigw_sk_your_api_key" \
-H "Content-Type: application/json" \
-N \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "Write a haiku about APIs."}],
"stream": true
}'

Tool Calling Example

Terminal window
curl https://your-gateway.example.com/v1/chat/completions \
-H "Authorization: Bearer aigw_sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [{"role": "user", "content": "What is the weather in London?"}],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
}
],
"tool_choice": "auto"
}'

Gateway Features

The chat completions endpoint passes through the full gateway middleware pipeline:

  • Request normalization — requests are translated to a unified internal format, enabling routing to any provider.
  • RAG injection — if RAG is configured for the tenant, relevant documents are injected into the conversation context.
  • Prompt guards — configurable content filters for injection detection, PII, and toxicity.
  • Budget checks — requests are blocked if the tenant or API key budget has been exceeded.
  • Semantic cache — identical or semantically similar requests may be served from cache.
  • Usage tracking — token usage is recorded for billing and analytics.
  • Audit logging — requests are logged for compliance.