Introducing PlayDialog – A voice model built for fluid, emotive conversation

November 11, 2024

Today, we are launching PlayDialog beta, our most ambitious and powerful end-to-end AI speech model. PlayDialog uses a conversation’s historical context to control prosody, intonation, emotion and pacing to deliver more natural sounding speech, setting new standards for matching how humans speak in real-life situations. PlayDialog is an excellent fit for creating authentic conversational experiences like narration, voice dubbing, synthetic podcasts, and supporting immersive and engaging 1:1 voice experiences with customers in business contexts.

We’re also releasing PlayNote, a tool that lets users create conversational experiences from files like PDFs, text, videos, and other media. Users can create podcasts, briefings, narrations, and even childrens’ stories in minutes, and can experience PlayDialog’s fluid and natural-sounding speech quality. Uniquely, PlayNote is also accessible through an API, making it easy to create audio content programmatically without using a UI.

To experience PlayNote, visit our PlayNote app here. You can also use our playground to test our models with different text prompts here.

It sounds just like a human

PlayDialog beta was trained on 100s of millions of conversations that represent real-world examples, and is approximately ten times larger than Play AI 3.0 mini. It closely matches human speech on prosody (intonation, pacing of speech), meaning it’s far harder to tell that it’s an AI model. PlayDialog beta also supports streaming from LLMs using websockets, allowing fast responses to LLM powered applications.

In blind testing, PlayDialog beta outperformed the leading competitive models in the market by 2:1 (n=600), with expressiveness scoring highest as a factor for the preference.

PlayDialog is complementary to Play 3.0 mini, which is optimized for low latency, speech accuracy, and today supports over 30 languages.

It uses the whole conversation as context

Unlike previous generations of speech models, PlayDialog beta understands the entire conversational context and how each sentence, or speaker, influences speech generation. We built a novel architecture that we call an “Adaptive Speech Contextualizer” (ASC) that allows the model to use the full context and history of a conversation, meaning that every response isn't just a standalone output; it's enriched with appropriate prosody, tone, and emotion that reflect the flow of the conversation.

By capturing these details with ASC, generated speech is far more natural and human sounding, meaning that synthetic podcasts now sound like the speakers are in the same room and feeding off each other, narration can sound exciting and engaging, and more.

Whether matching the excitement in a lively discussion or the empathy needed in sensitive topics, PlayDialog adapts seamlessly, making interactions feel more natural and human-like.

See it in action on PlayNote, our narration and podcast creation tool

Along with PlayDialog beta, we are launching PlayNote, a narration and podcast creation tool that lets anyone create powerful, natural sounding narrations, podcasts, briefings and more in just minutes from multiple media types, like text files, spreadsheets, and images. Users can generate a podcast between two user-selectable voices, create summaries of large documents for online or offline listening, or even children’s stories using some of the many voices included. PlayNote is also powered by PlayDialog beta, meaning that the generated speech is fluid, engaging, emotional, and human sounding.

PlayNote is also available through an API, allowing developers to programmatically generate engaging content at large scale.

Try PlayNote here today, or try some of the examples below. Our PlayNote API documentation and guides are available here. We also encourage you to follow us on X as we showcase more examples in the coming weeks.

Use PlayDialog and PlayNote through our API

PlayDialog and PlayNote are available via API today on Play.ai. We also created developer guides on Creating multi-turn scripted conversations with TTS. and Generating conversations from PDFs with PlayNote. ⬢

PlayAI

Index