Meet Dialog 1.0 – The most expressive voice model we’ve ever built

February 3, 2025

PlayAI’s Dialog Text-to-Speech model is now in general availability, bringing multilingual capabilities, and exceptional performance to applications requiring emotive, human-like speech. In recent third-party benchmark tests, Dialog was preferred by 10:1 vs. ElevenLabs v2.5 Turbo, and by over 3:1 vs. ElevenLabs Multilingual v2.0.

Play the video below to find out what it sounds like, or visit our AI voiceover Studio to try it for yourself.

PlayDialog has the most human sounding voices for business and narrations

Many applications for voice AI depend on low latency, which is why we tested Dialog against ElevenLabs’ v2.5 Turbo model. Both products have similar Time-to-First-Audio (TTFA), and are suitable for low latency applications like voice agents, contact centers, gaming and entertainment. Dialog’s fluid and emotionally coherent speech led people to prefer it to 10:1 over v2.5 Turbo, indicating that frontier voice AI models are solving the problem of balancing quality of output with speed of output.

Comparing Dialog to ElevenLabs’ Multilingual v2.0 (which has longer latency and would be more suited to applications like dubbing), we tested 60 male and female voice generations using identical text with a panel of 100 respondents. In these tests, Dialog was preferred 76% of the time, or over 3 to 1 vs. ElevenLabs.

In both benchmarking analyses, respondents highlighted accurate expressiveness, and pacing as key reasons for the preference.

Customers love it too: “NextKast built a fully automated AI DJ for our radio station customers using PlayAI Dialog voices. We love how expressive, emotional, and natural the voices sound, and didn’t find anything else close in the market. In radio, keeping your audience engaged is the whole game, and Play’s voices do that” – Winston Potgieter, Founder, Axis Entertainment

Figure 1: Human preference comparison between PlayDialog and ElevenLabs Multilingual v2 across 60 samples.

We’re releasing the raw test data to the public if you want to learn more, and for each sample you can see the text prompt and hear the raw audio:

PlayAI Dialog vs. Elevenlabs Multilingual v2.0 – link
PlayAI Dialog vs. Elevenlabs v2.5 Turbo – link

Many thanks to our partners at Podonos, who conducted the independent testing. Podonos is a third-party AI model evaluation service that uses human evaluation to assess the quality of AI models, including voice models.

PlayDialog is fast, too

Not only do Play AI’s voice models sound more human, but their efficient models have lower TTFA latency than most other models in the market today, opening up use cases like voice agents, call center software solutions, and in-game audio where low latency is essential.

PlayDialog is now multilingual

In addition to English, PlayDialog is now multilingual. We’ve added support for Chinese, French, German, Hindi, Japanese, Korean, Portuguese and Urdu.

An additional 23 languages are experimental: Afrikaans, Arabic, Bengali, Bulgarian, Croatian, Czech, Danish, Dutch, Greek, Hebrew, Hungarian, Indonesian, Italian, Malay, Polish, Russian, Serbian, Swedish, Tagalog, Thai, Turkish, Ukrainian, and Xhosa.

All these languages are available through our API and in our AI Voiceover Studio. And we'd love feedback which nuances of your preferred language we could improve on.

Building accurate, human-sounding voice AI models is not trivial. The benchmarks above show how far we’ve come, but don’t take our word for it, try it on our AI Voiceover Studio tool, or sign up for a free API key and experiment with our low-latency API for yourself. ⬢

PlayAI

Index