Skip to content
AssemblyAI logo

AssemblyAI Universal-3 Pro

Automatic Speech RecognitionAssemblyAIProxied

AssemblyAI's Universal 3 Pro speech recognition model for high-accuracy transcription.

Model Info
Terms and Licenselink
More informationlink
PricingView pricing in the Cloudflare dashboard

Usage

TypeScript
const response = await env.AI.run(
'assemblyai/universal-3-pro',
{
audio_url: 'https://cdn.openai.com/API/docs/audio/alloy.wav',
},
{
gateway: { id: 'default' },
}
)
console.log(response)
The sun rises in the east and sets in the west. This simple fact has been observed by humans for thousands of years.

Examples

With Language Code — Transcribe with an explicit language code
TypeScript
const response = await env.AI.run(
'assemblyai/universal-3-pro',
{
audio_url: 'https://cdn.openai.com/API/docs/audio/echo.wav',
language_code: 'en',
},
{
gateway: { id: 'default' },
}
)
console.log(response)
In the heart of the city, there is a large park where people go to relax and enjoy nature. The park has a beautiful pond with ducks and swans.
With Key Terms — Improve accuracy for domain-specific vocabulary
TypeScript
const response = await env.AI.run(
'assemblyai/universal-3-pro',
{
audio_url: 'https://cdn.openai.com/API/docs/audio/nova.wav',
keyterms_prompt: [
'Kubernetes',
'microservices',
'containerization',
'load balancer',
],
},
{
gateway: { id: 'default' },
}
)
console.log(response)
In the kitchen, the aroma of freshly baked bread filled the air. The loaves were golden brown and crusty on the outside and soft and warm on the inside.
Speaker Diarization — Identify different speakers in the audio
TypeScript
const response = await env.AI.run(
'assemblyai/universal-3-pro',
{
audio_url: 'https://cdn.openai.com/API/docs/audio/onyx.wav',
speaker_labels: true,
},
{
gateway: { id: 'default' },
}
)
console.log(response)
The train chugged along the tracks, carrying passengers to their destinations. The rhythmic sound of the wheels on the rails was soothing.

Parameters

audio_url
stringrequiredThe URL of the audio file to transcribe. Can be a publicly accessible URL or a data URI (data:audio/...;base64,...). For data URIs, the audio will be uploaded to AssemblyAI automatically.
language_code
stringThe language code for the audio file (e.g., "en", "es", "fr"). Defaults to automatic language detection.
language_detection
booleanEnable automatic language detection. When enabled with speech_models, the system will automatically select the best model for the detected language.
prompt
stringA custom prompt to guide transcription style, formatting, and output characteristics. Maximum 1,500 words.
temperature
numberminimum: 0maximum: 1Controls randomness in model output (0.0-1.0). Lower values make output more deterministic. Default is 0.0.
speaker_labels
booleanEnable speaker diarization to identify different speakers in the audio.
speakers_expected
integerminimum: 1maximum: 9007199254740991Expected number of speakers for speaker diarization.
auto_chapters
booleanEnable automatic chapter detection.
entity_detection
booleanEnable detection of entities like names, organizations, and locations.
sentiment_analysis
booleanEnable sentiment analysis for each sentence.
auto_highlights
booleanEnable automatic extraction of key phrases and highlights.
content_safety
booleanEnable content safety detection for sensitive content.
iab_categories
booleanEnable IAB (Interactive Advertising Bureau) content taxonomy classification.
disfluencies
booleanInclude filler words like "um", "uh", etc. in the transcript.
multichannel
booleanProcess each audio channel separately for multi-channel audio files.
dual_channel
booleanProcess audio as dual-channel (stereo) for better accuracy.
webhook_url
stringformat: uriURL to receive webhook notifications when transcription is complete.
audio_start_from
integerminimum: 0maximum: 9007199254740991Timestamp (in milliseconds) to start transcription from.
audio_end_at
integerminimum: 0maximum: 9007199254740991Timestamp (in milliseconds) to end transcription at.
boost_param
stringenum: low, default, highHow much to boost the words in word_boost.
filter_profanity
booleanFilter profanity from the transcription.
redact_pii
booleanRedact personally identifiable information.
redact_pii_audio
booleanGenerate a redacted audio file with PII removed.
redact_pii_sub
stringenum: entity_name, hashStrategy for substituting redacted PII.
speech_threshold
numberminimum: 0maximum: 1Confidence threshold for speech detection.
domain
stringenum: medical-v1Domain-specific transcription mode. "medical-v1" enables medical terminology optimization.

API Schemas (Raw)

Input
Output