AWS Machine Learning Blog
Breaking news: Amazon Polly’s Newscaster voice and more authentic speech, launching today
For a long time, it was only in science fiction that machines verbalized emotions. As of today, Amazon Polly is one step closer to changing that.
As we work on Amazon Polly, we’re constantly seeking to improve the voices. We hope you’ll agree that today’s announcement of not only Neural Text-to-Speech (NTTS) but also the Newscaster style is, well, newsworthy.
Hear the news from Polly:
Listen now Voiced by Amazon Polly |
Synthesizing the newsperson style is innovative and unprecedented. And it brings great excitement in the media world and beyond.
Our earliest users include media giants like Gannett (whose USA Today is the most widely read US newspaper) and The Globe and Mail (the biggest newspaper in Canada), publishing leaders (whose customers, in turn, are news outlets) such as BlueToad and Trinity Audio, as well as organizations in education, healthcare, and gaming.
“We strive to innovate and bring our audiences news and content wherever they are. With more than 100 newsrooms across the country, it’s important for Gannett | USA TODAY NETWORK to produce audio content efficiently. Services like Amazon Polly and features like its Newscaster voice help us deliver breaking news and original reporting with increased speed and fidelity worthy of our brands,” says Gannett’s Scott Stein, Vice President of Content Ventures.
Greg Doufas, Chief Technical and Digital Officer at The Globe and Mail, concurs that the newest offerings with Amazon Polly are on the cutting edge. “Amazon Polly Newscaster enables us to provide our readers with more features to further their experience with our newspaper. This text-to-voice feature from AWS is miles ahead of anything we’ve heard to date.”
The early days of Amazon Polly are showing that readers enjoy engaging with Polly’s Newscaster voice. Paul DeHart, CEO of BlueToad, comments, “We focus on providing a robust and technologically advanced suite of digital solutions for our customers. When Amazon Polly’s new NTTS and Newscaster offerings came along, we immediately jumped on them, and we’ve already seen excitement among our own customer base. SUCCESS Magazine is particularly enthused about the new offerings.”
Stuart Johnson, Owner and CEO of SUCCESS, elaborates, “The trends increasingly show that consumers are gravitating towards audio content. With the exceedingly high quality speech that Polly now offers, we’re even better equipped to deliver these exceptional listening experiences to our audience.”
The team at Trinity Audio touts itself as “an audio content solution, providing publishers new ways to engage audiences,” and is very animated about the announcements. “Who doesn’t want to listen to the news by an articulate reader who never says ‘um’?” asks Ron Jaworski, CEO of Trinity Audio.
Publishers such as Minute Media, a sports article and video provider, are enthusiastic about the new AWS offerings as well, which they work with Trinity Audio to leverage. Rich Routman, President & CRO of Minute Media, explains, “At Minute Media, we seek to partner with best-in-breed technology solutions, [and with AWS and Trinity], we have the technology to transition our scale in the written word to audio at scale and across multiple platforms, aligning ourselves further with this emerging platform for media consumption.”
News companies’ excitement about Amazon Polly’s latest advance are reflected by non-news sources as well. “We make voice-controlled games at Volley – games where players get to converse with other characters. We are constantly asking, ‘What new experiences can be possible with voice as an input?’ We can’t wait to start developing a game leveraging the Newscaster style, where our players get to engage with a brand new character in a fun and educational new way,” says James Wilsterman, Volley’s Founder and CTO.
Echoing that excitement is Encyclopedia Britannica. The widely read encyclopedia switched to online-only content in 2012, and its hundreds of thousands of articles can be read or listened to via its “Read to Me” feature voiced by Amazon Polly. Vice President Matt Dube comments, “When we think about our next steps and innovations, this high-caliber voice technology has been one of the missing pieces for us. We’re excited to use it as we continue innovating.” The team has several new efforts underway that utilize the rich spoken content to help their users deepen their knowledge.
And for CommonLit, a nonprofit ed-tech organization dedicated to ensuring that all students graduate high school with the reading and writing skills necessary to succeed in college and beyond, Polly’s solution is transformative. Each of the thousands of texts in CommonLit’s content library features a “Read Aloud” button, and the organization is importing new texts with Amazon Polly NTTS as the reading voice.
CommonLit CTO Geoff Harcourt says, “With the latest for Polly, we’re able to offer learners an experience that passes the Turing Test; our users would be hard-pressed to realize that the voice reading to them is not human.” The CommonLit team appreciates the support that this tool provides to struggling readers and English-language-learner (ELL) students, as “this helps students learn pronunciation, and provides a crucial scaffold for students with learning difficulties,” Harcourt adds.
Listen to learn about the Turing Test:
Listen now Voiced by Amazon Polly |
The technologies behind Amazon Polly are now starting to mimic the workings of the human brain, by leveraging a scientific advance called machine learning to build Neural Text-to-Speech systems (NTTS). Similar to the way human children learn to speak, these systems generate sounds, then improve their speech by listening to recorded natural speech and copying it. To build Polly’s NTTS system, Amazon researchers first taught the neural network the basics of how to speak by exposing it to a vast quantity of natural speech (the “training data” in technical terms). Over time, it learned how to reproduce those example utterances and, eventually, to generalize from them to produce new utterances. Because the network learned how to speak by example, the generated sounds are more lifelike than before. Now, Polly’s NTTS system enables it to easily learn the differences between speaking styles and rapidly adapt to new styles.
You can take Amazon Polly for a spin today by visiting https://thinkwithwp.com/polly/features/, or have a listen on The Globe and Mail or Trinity Audio’s blog.
About the Author
Robin Dautricourt is a Principal Product Manager for Amazon Text-to-Speech, and he leads product management for Amazon Polly. He enjoys innovating on behalf of customers, to launch features that will benefit their business needs and end users. He enjoys spending his free time with his wife and kids.