Resemble AI Open-Sources Its Voice-Cloning Model, Chatterbox

The US-based Resemble AI, a voice cloning platform, has open-sourced Chatterbox—its model that includes both text-to-speech and voice conversion capabilities—the company announced on X.

A recent test conducted through Podonos was designed to assess the use of Resemble AI’s Chatterbox and ElevenLabs in generating natural and high-quality speech. Both systems generate audio samples that range from 7 to 20 seconds in duration using the same text inputs (zero-shot, no prompt engineering, and audio processing).

Participants were made to listen to audio samples from both these models, revealing that 63.75% of listeners preferred Chatterbox over ElevenLabs. The results also supported Chatterbox’s position as a competitive open-source model that offers features like emotion control and rapid voice cloning.

Chatterbox claims to be the first open-source model with emotion exaggeration control. It can adjust intensity from monotone to dramatically expressive with a single parameter.

In February of this year, Resemble AI launched Rapid Voice Clone 2.0, a tool that allows users to create high-quality voice content using just 20 seconds of audio. This powerful tool facilitates seamless voice generation, editing, and localisation. Users can easily make instant modifications, such as swapping words, fine-tuning tone, or adjusting delivery, without re-recording.

Open-source AI voice cloning is a groundbreaking technology that allows users to mimic voices with remarkable precision. A prime example is OpenVoice, developed through collaboration between researchers from MIT, Tsinghua University, and the Canadian startup MyShell, the website states.

Similarly, another AI startup, Zyphra, launched its open-source text-to-speech models in February. These models can clone a voice with only five seconds of sample audio, which generates realistic results with less than 30 seconds of recorded speech.

Reports show that the models, each measuring 1.6 billion parameters, were trained on over 200,000 hours of speech data, which includes both neutral-toned speech, such as audiobook narration, and highly expressive speech.

The post Resemble AI Open-Sources Its Voice-Cloning Model, Chatterbox appeared first on Analytics India Magazine.

Related Posts