Eleven v3: It Doesn't Just Speak, It Acts.
The most expressive and realistic AI speech model ever. Eleven v3 understands emotion, interruption, and nuance, bringing human-like performance to over 70 languages.
Explore FeaturesOfficial Introduction
Revolutionary AI Voice Capabilities
70+ Languages
From English and Chinese to Sinhala and Kyrgyz, reach a global audience with native-sounding voices.
Dynamic Dialogues
Create natural, multi-speaker conversations with interruptions and overlapping speech.
Expressive Control
Use audio tags like [laughs] or [whispers] to direct the AI's performance with precision.
Text to Dialogue
Automatically generate coherent, multi-role dialogues from a single block of plain text.
V3 Major Upgrade Highlights
1. Unmatched Emotional Realism
Inject lifelike emotion, tone, and sound effects. With simple
audio tags, you can direct the voice to transition from a [whisper]
to a [shout]
, add [laughter]
, or even a thoughtful [sigh]
. Create truly immersive audio
experiences.
"She entered the room and [whispers] 'I have a secret.' Then, unable to contain her excitement, she [laughs] and says, 'We won!'"
Speaker A: "Did
you hear about the launch?"
Speaker B: "The v3 launch?
Of course! I was just about to--"
Speaker A: "They
said it's the most realistic model yet!"
2. True-to-Life Conversations
Forget stilted, turn-based AI speech. V3 produces fluid dialogues where speakers can interrupt, talk over each other, and react in real-time, perfectly mimicking the natural flow of human interaction.
3. The 'Text to Dialogue' Revolution
This powerful new mode automatically detects different roles and tones within a single text block, weaving them into a seamless dialogue. No need for complex tagging or scripts. Perfect for audio dramas, game characters, and dynamic ad reads.
Just paste your script, and let the AI cast the characters, direct the scene, and produce a fully-voiced dialogue.
v3 vs v2: A Leap Forward
Feature | Eleven v3 (Alpha) | Eleven Multilingual v2 |
---|---|---|
Primary Focus | Dramatic delivery & performance, emotional range | Lifelike, stable, and consistent quality |
Languages | 70+ | 29 |
Expressive Control | Full range of emotions via Audio Tags (e.g., [laughs]) | Basic control (e.g., pauses) |
Dialogue Generation | Native multi-speaker & Text to Dialogue API | Possible, but less natural and without a dedicated mode |
Best For | Audiobooks, character voices, and highly creative content | Long-form narration, corporate videos, and multilingual projects |
Coming Soon for Creators & Developers
An API for Eleven v3 is on the way, unlocking programmatic access to the world's most advanced speech model for your applications, tools, and creative projects.
Read API Docs