ElevenLabs Unveils ‘v3’, Its Most Expressive Text-to-Speech Model Yet

ElevenLabs has launched the alpha version of its new flagship text-to-speech model, Eleven v3, which the company claims is its most expressive model to date. The release brings inline audio controls, dialogue generation, and support for over 70 languages, targeting creators in film, gaming, audiobooks, and accessibility.

The model introduces audio tags such as [whispers], [excited], and [laughs] for real-time emotional control, and supports multi-speaker dialogue with a new Text to Dialogue API. It can generate dynamic, overlapping speech turns with natural interruptions and emotional shifts, offering a significant leap over previous versions.

Addressing the release, Mati Staniszewski, co-founder and CEO of ElevenLabs, said, “This release is the result of the vision and leadership of my co-founder Piotr [Piotr Dabkowski] and the incredible research team he’s built. Creating a good product is hard—creating an entirely new paradigm is almost impossible.”

“I, and all of us at ElevenLabs, feel lucky to witness the magic this team brings to life—and with this release, we’re excited to push the frontier once again.”

The company noted that while previous models had sufficient audio quality, they lacked expressive nuance, a limitation now addressed by v3.

The tool’s deeper text comprehension also enhances cadence, stress, and emotion across different languages, while new scripting flexibility enables complex audio storytelling. However, ElevenLabs cautioned that v3’s latency and prompt engineering demands make it less suited for real-time or conversational use, recommending v2.5 Turbo or Flash for those scenarios.

Users can access the model through the ElevenLabs website. Until the end of June, a promotional 80% discount on UI-based usage is available. Public API access is expected soon, and early access is available via sales enquiry.

A real-time version of v3 is reportedly under development. For now, creators looking to inject nuance into dialogue-heavy media may find the alpha version a compelling upgrade.

The post ElevenLabs Unveils ‘v3’, Its Most Expressive Text-to-Speech Model Yet appeared first on Analytics India Magazine.

Related Posts