Amazon unveils the largest text-to-speech model ever created
Researchers at Amazon have done just that introduced the largest text-to-speech model to date, which will have enhanced features that allow it to better articulate complex sentences.
The model, BASE TTS (text-to-speech), which stands for Big Adaptive Streamable TTS with Emergent capabilities, could lay the foundation for more human-like interactions.
According to the research, it appears that extensive training for TTS models could improve reliability and versatility in the same way we see with large language models (LLMs) used for artificial intelligence.
Amazon’s BASE TTS impresses researchers
The text-to-speech model is trained on 100,000 hours of public domain speech data, giving the tool a “state-of-the-art naturalness.” Mainly English, some German, Dutch and Spanish data were also used.
Furthermore, the researchers found that even training a TTS model on 10,000 hours of speech can result in an improved ability to articulate complex sentences more naturally.
With 980 million parameters, BASE-large is recognized as the largest text-to-speech model ever created. The team also trained smaller models, with 400 million and 150 million parameters, and 10,000 and 1,000 hours of speech, to compare the results.
The Amazon team describes BASE TTS as a “high-fidelity model that can mimic speaker characteristics with just a few seconds of reference audio,” recognizing the need for more research but also recognizing its potential.
Some of the key areas the researchers focused on were compound nouns, emotions, foreign words, paralinguistics, punctuation, questions and syntactic complexities – examples can be found on a dedicated page. web page.
With revolutionary artificial intelligence set to take center stage for most of 2023, text-to-speech breakthroughs like this could continue to put ever-futuristic technologies into the hands of the masses in 2024, but the research team’s cautious approach highlights the necessity of good regulation in times of safety and security. privacy fears.