Eleven v3: It Doesn't Just Speak, It Acts.

The most expressive and realistic AI speech model ever. Eleven v3 understands emotion, interruption, and nuance, bringing human-like performance to over 70 languages.

Explore Features

Official Introduction

Revolutionary AI Voice Capabilities

70+ Languages

From English and Chinese to Sinhala and Kyrgyz, reach a global audience with native-sounding voices.

Dynamic Dialogues

Create natural, multi-speaker conversations with interruptions and overlapping speech.

Expressive Control

Use audio tags like [laughs] or [whispers] to direct the AI's performance with precision.

Text to Dialogue

Automatically generate coherent, multi-role dialogues from a single block of plain text.

V3 Major Upgrade Highlights

1. Unmatched Emotional Realism

Inject lifelike emotion, tone, and sound effects. With simple audio tags, you can direct the voice to transition from a [whisper] to a [shout], add [laughter], or even a thoughtful [sigh]. Create truly immersive audio experiences.

View All Supported Tags →

"She entered the room and [whispers] 'I have a secret.' Then, unable to contain her excitement, she [laughs] and says, 'We won!'"

Speaker A: "Did you hear about the launch?"
Speaker B: "The v3 launch? Of course! I was just about to--"
Speaker A: "They said it's the most realistic model yet!"

2. True-to-Life Conversations

Forget stilted, turn-based AI speech. V3 produces fluid dialogues where speakers can interrupt, talk over each other, and react in real-time, perfectly mimicking the natural flow of human interaction.

3. The 'Text to Dialogue' Revolution

This powerful new mode automatically detects different roles and tones within a single text block, weaving them into a seamless dialogue. No need for complex tagging or scripts. Perfect for audio dramas, game characters, and dynamic ad reads.

Just paste your script, and let the AI cast the characters, direct the scene, and produce a fully-voiced dialogue.

v3 vs v2: A Leap Forward

Feature	Eleven v3 (Alpha)	Eleven Multilingual v2
Primary Focus	Dramatic delivery & performance, emotional range	Lifelike, stable, and consistent quality
Languages	70+	29
Expressive Control	Full range of emotions via Audio Tags (e.g., [laughs])	Basic control (e.g., pauses)
Dialogue Generation	Native multi-speaker & Text to Dialogue API	Possible, but less natural and without a dedicated mode
Best For	Audiobooks, character voices, and highly creative content	Long-form narration, corporate videos, and multilingual projects

Coming Soon for Creators & Developers

An API for Eleven v3 is on the way, unlocking programmatic access to the world's most advanced speech model for your applications, tools, and creative projects.

Read API Docs