Inworld TTS 1.5 Mini

Text-to-Speech • Inworld

Ultra-fast, cost-efficient text-to-speech with approximately 120ms latency and 15-language support.

Model Info
Terms and License	link ↗
More information	link ↗
Pricing	View pricing in the Cloudflare dashboard ↗

Usage

TypeScript
cURL

const response = await env.AI.run(
  'inworld/tts-1.5-mini',
  {
    output_format: 'mp3',
    temperature: 1,
    text: 'Hello! Welcome to Cloudflare AI Gateway. Let me show you what we can do.',
    timestamp_type: 'none',
    voice_id: 'Dennis',
  },
)
console.log(response)

curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run \
  --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "model": "inworld/tts-1.5-mini",
  "input": {
    "output_format": "mp3",
    "temperature": 1,
    "text": "Hello! Welcome to Cloudflare AI Gateway. Let me show you what we can do.",
    "timestamp_type": "none",
    "voice_id": "Dennis"
  }
}'

Output
Raw response

{
  "gatewayMetadata": {
    "keySource": "Unified"
  },
  "result": {
    "audio": "https://pub-04a6d208d361438ea01b797e6973bd19.r2.dev/catalog/inworld__tts-1.5-mini/simple-speech.mp3"
  },
  "state": "Completed"
}

Examples

Fast Speech — Speed up speech for quick playback

TypeScript
cURL

const response = await env.AI.run(
  'inworld/tts-1.5-mini',
  {
    output_format: 'mp3',
    speaking_rate: 1.4,
    temperature: 1,
    text: 'This is a fast-paced summary of the key findings from the quarterly report. Revenue is up fifteen percent and user growth exceeded expectations.',
    timestamp_type: 'none',
    voice_id: 'Dennis',
  },
)
console.log(response)

curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run \
  --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "model": "inworld/tts-1.5-mini",
  "input": {
    "output_format": "mp3",
    "speaking_rate": 1.4,
    "temperature": 1,
    "text": "This is a fast-paced summary of the key findings from the quarterly report. Revenue is up fifteen percent and user growth exceeded expectations.",
    "timestamp_type": "none",
    "voice_id": "Dennis"
  }
}'

Output
Raw response

{
  "gatewayMetadata": {
    "keySource": "Unified"
  },
  "result": {
    "audio": "https://pub-04a6d208d361438ea01b797e6973bd19.r2.dev/catalog/inworld__tts-1.5-mini/fast-speech.mp3"
  },
  "state": "Completed"
}

Low Latency — Minimize latency by disabling text normalization

TypeScript
cURL

const response = await env.AI.run(
  'inworld/tts-1.5-mini',
  {
    apply_text_normalization: false,
    output_format: 'mp3',
    temperature: 1,
    text: 'Quick response needed. The server is ready.',
    timestamp_type: 'none',
    voice_id: 'Dennis',
  },
)
console.log(response)

curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run \
  --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  --header "Content-Type: application/json" \
  --data '{
  "model": "inworld/tts-1.5-mini",
  "input": {
    "apply_text_normalization": false,
    "output_format": "mp3",
    "temperature": 1,
    "text": "Quick response needed. The server is ready.",
    "timestamp_type": "none",
    "voice_id": "Dennis"
  }
}'

Output
Raw response

{
  "gatewayMetadata": {
    "keySource": "Unified"
  },
  "result": {
    "audio": "https://pub-04a6d208d361438ea01b797e6973bd19.r2.dev/catalog/inworld__tts-1.5-mini/low-latency.mp3"
  },
  "state": "Completed"
}

apply_text_normalization

booleanWhen enabled, text normalization expands numbers, dates, times, and abbreviations before converting to speech. Turning this off may reduce latency.

bit_rate

integermaximum: 9007199254740991minimum: -9007199254740991Bits per second of the audio. Only for compressed audio formats (mp3, opus). The default is 128,000.

output_format

stringrequireddefault: mp3enum: mp3, opus, wav, flacThe output format for the audio. Supported formats are mp3, opus, wav, and flac. Defaults to mp3.

sample_rate

integermaximum: 9007199254740991minimum: -9007199254740991The synthesis sample rate in hertz. Accepts: 8000, 16000, 22050, 24000, 32000, 44100, 48000. The default is 48,000.

speaking_rate

numbermaximum: 1.5minimum: 0.5Speaking rate/speed, in the range [0.5, 1.5]. The default is 1.0. We recommend using values above 0.8 to ensure high quality.

temperature

numberrequireddefault: 1maximum: 2minimum: 0.01Determines the degree of randomness when sampling audio tokens. Defaults to 1.0. Accepts values between 0 (exclusive) and 2 (inclusive). Higher values = more expressive, lower values = more deterministic.

text

stringrequiredmaxLength: 2000The text to be synthesized into speech. Maximum input of 2,000 characters.

timestamp_type

stringrequireddefault: noneenum: none, word, characterControls timestamp metadata returned with the audio. "word" returns word-level timing, "character" returns character-level timing. Note: adds latency. Defaults to none.

voice_id

stringrequireddefault: Dennisenum: Loretta, Darlene, Marlene, Hank, Evelyn, Celeste, Pippa, Tessa, Liam, Callum, Hamish, Abby, Graham, Rupert, Mortimer, Snik, Anjali, Saanvi, Arjun, Claire, Oliver, Simon, Elliot, James, Serena, Gareth, Vinny, Lauren, Jessica, Ethan, Tyler, Jason, Chloe, Veronica, Victoria, Miranda, Sebastian, Victor, Malcolm, Nate, Brian, Amina, Kelsey, Derek, Evan, Kayla, Jake, Grant, Tristan, Nadia, Selene, Marcus, Riley, Damon, Cedric, Mia, Naomi, Jonah, Levi, Avery, Brandon, Conrad, Bianca, Lucian, Trevor, Alex, Ashley, Craig, Deborah, Dennis, Edward, Elizabeth, Hades, Julia, Pixie, Mark, Olivia, Priya, Ronald, Sarah, Shaun, Theodore, Timothy, Wendy, Dominus, Hana, Clive, Carter, Blake, Luna, Reed, Duncan, Felix, Eleanor, SophieThe ID of the voice to use for synthesizing speech. Defaults to Dennis.

audio

stringURL to the generated audio file

API Schemas (Raw)

Input

Output