Skip to content
MiniMax logo

MiniMax Speech 2.8 HD

Text-to-SpeechMiniMaxProxied

MiniMax Speech 2.8 HD focuses on studio-grade audio generation with emotion control, multilingual support (40+ languages), and voice cloning.

Model Info
Terms and Licenselink
More informationlink
PricingView pricing in the Cloudflare dashboard

Usage

TypeScript
const response = await env.AI.run(
'minimax/speech-2.8-hd',
{
text: 'Hello! Welcome to Cloudflare AI Gateway. Let me show you what we can do.',
voice_id: 'English_expressive_narrator',
speed: 1,
volume: 1,
pitch: 0,
format: 'mp3',
},
{
gateway: { id: 'default' },
}
)
console.log(response)

Examples

Custom Voice — Use a specific voice and adjust speed
TypeScript
const response = await env.AI.run(
'minimax/speech-2.8-hd',
{
text: 'The weather today is sunny with a high of 72 degrees. Perfect for a walk in the park.',
voice_id: 'English_expressive_narrator',
speed: 0.9,
volume: 1,
pitch: 0,
format: 'mp3',
},
{
gateway: { id: 'default' },
}
)
console.log(response)
With Emotion — Apply emotional tone to speech
TypeScript
const response = await env.AI.run(
'minimax/speech-2.8-hd',
{
text: "Congratulations! You've just won the grand prize! This is absolutely incredible news!",
voice_id: 'English_expressive_narrator',
speed: 1,
volume: 1,
pitch: 0,
emotion: 'happy',
format: 'mp3',
},
{
gateway: { id: 'default' },
}
)
console.log(response)
High Sample Rate — Studio quality at 44.1kHz sample rate
TypeScript
const response = await env.AI.run(
'minimax/speech-2.8-hd',
{
text: 'This recording is generated at studio quality sample rate for the highest possible audio fidelity.',
voice_id: 'English_expressive_narrator',
speed: 1,
volume: 1,
pitch: 0,
format: 'mp3',
sample_rate: 44100,
},
{
gateway: { id: 'default' },
}
)
console.log(response)

Parameters

text
stringrequiredmaxLength: 10000The text to convert to speech. Maximum 10,000 characters.
voice_id
stringrequireddefault: English_expressive_narratorThe voice ID to use for synthesis
speed
numberrequireddefault: 1minimum: 0.5maximum: 2Speech speed (0.5 to 2)
volume
numberrequireddefault: 1minimum: 0maximum: 10Speech volume (0 to 10)
pitch
integerrequireddefault: 0minimum: -12maximum: 12Pitch adjustment (-12 to 12)
emotion
stringenum: happy, sad, angry, fearful, disgusted, surprised, calm, fluentEmotion control for synthesized speech
format
stringrequireddefault: mp3enum: mp3, flac, wavOutput audio format

API Schemas (Raw)

Input
Output