Skip to content

Audio (Transcription, Translation & Speech)

The gateway provides three audio endpoints: transcription (speech-to-text), translation (speech-to-English), and speech synthesis (text-to-speech).

Transcriptions

Transcribe audio into text in the original language.

POST /v1/audio/transcriptions

Required capability: audio

Request Body

ParameterTypeRequiredDescription
modelstringYesTranscription model (e.g. whisper-1).
filestringConditionalBase64-encoded audio data. Either file or file_url is required.
file_urlstringConditionalURL to the audio file. Either file or file_url is required.
languagestringNoISO 639-1 language code (e.g. en, fr, de). Improves accuracy when specified.
promptstringNoOptional text to guide the transcription style or provide context.
response_formatstringNoOutput format: "json" (default), "text", "srt", "verbose_json", or "vtt".
temperaturenumberNoSampling temperature between 0 and 1.

Request size limit: 10 MB

Response

{
"text": "The quick brown fox jumps over the lazy dog."
}

For verbose_json format, additional fields are included such as word-level timestamps.

Example

Terminal window
curl https://your-gateway.example.com/v1/audio/transcriptions \
-H "Authorization: Bearer aigw_sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "whisper-1",
"file_url": "https://example.com/audio/recording.mp3",
"language": "en",
"response_format": "json"
}'

Translations

Translate audio into English text.

POST /v1/audio/translations

Required capability: audio

Request Body

ParameterTypeRequiredDescription
modelstringYesTranslation model (e.g. whisper-1).
filestringConditionalBase64-encoded audio data. Either file or file_url is required.
file_urlstringConditionalURL to the audio file. Either file or file_url is required.
promptstringNoOptional text to guide the translation.
response_formatstringNoOutput format: "json" (default), "text", "srt", "verbose_json", or "vtt".
temperaturenumberNoSampling temperature between 0 and 1.

Request size limit: 10 MB

Response

{
"text": "The translated text in English."
}

Example

Terminal window
curl https://your-gateway.example.com/v1/audio/translations \
-H "Authorization: Bearer aigw_sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "whisper-1",
"file_url": "https://example.com/audio/french-speech.mp3",
"response_format": "json"
}'

Speech (Text-to-Speech)

Synthesize speech from text input.

POST /v1/audio/speech

Required capability: tts

Request Body

ParameterTypeRequiredDescription
modelstringYesTTS model (e.g. tts-1, tts-1-hd).
inputstringYesThe text to synthesize (1-4096 characters).
voicestringYesVoice to use (e.g. alloy, echo, fable, onyx, nova, shimmer).
response_formatstringNoAudio format: "mp3" (default), "opus", "aac", "flac", "wav", or "pcm".
speednumberNoPlayback speed from 0.25 to 4.0. Defaults to 1.0.

Request size limit: 1 MB

Response

The response body is the raw audio data with the appropriate Content-Type header (e.g. audio/mpeg for MP3). This is a binary response, not JSON.

Example

Terminal window
curl https://your-gateway.example.com/v1/audio/speech \
-H "Authorization: Bearer aigw_sk_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "tts-1",
"input": "Hello, welcome to the AI Gateway.",
"voice": "alloy",
"response_format": "mp3",
"speed": 1.0
}' \
--output speech.mp3