AWS Startups Blog
Web Summit 2019: Rohit Prasad on “Evolution from keyword searches to AI-enabled conversations”
2014 was a formidable year for the tech industry. Not only was 2014 the basis of the smartphone and app explosion, but it was also the year that Amazon Echo, or Alexa, launched.
“Alexa revolutionized daily convenience as we know it,” explained Rohit Prasad, VP & Head Scientist for Amazon Alexa, Nov. 5 at the Web Summit conference in Lisbon, Portugal. “The cognitive load shifted from customers to AI. You talk, Alexa answers back.”
In a talk entitled “Evolution from keyword searches to AI-enabled conversations,” Prasad shared with the audience his perspective on how far Alexa has come in the five years since launch, as well as how advancements in and democratization of conversational AI will continue to deliver new experiences and seamlessly connect customers to services.
Taking a step back, Prasad began his talk by explaining how Alexa had to be great at four AI tasks to launch:
- Wake word detection. Alexa has to know when you say ‘Alexa’, you’re talking to it.
- Automatic speech recognition. Alexa has to change audio into words.
- Natural language understanding. Now that Alexa knows the words, what do they mean? This is the hardest task of all.
- Text to speech instances. Alexa needs to translate the text answer into audio.
Since launch, Prasad continued, each of these AI tasks has gotten three to four times better. And how exactly did these improvements happen? By using deep learning and a suite of techniques, he said. Specifically, Alexa expanded concepts to understand exponentially, and became a smoother and more natural process. Amazon also introduced two capabilities towards the democratization of AI, the first being Alex voice service. This made it easy for developers to integrate Alexa into devices, said Prasad. The second capability was the Alex skill kit, which allowed developers to build ‘skills’ into Alexa. Now, said Prasad, a developer could make a skill for anytime, for any occasion, or anyone.
So now that Alexa has billions of interactions a week, is in 80+ countries, and 15 languages, what’s next for AI? Prasad broke the down the key AI pillars for customer experience.
- More transparency and control. AI needs to be trusted by customers, said Prasad. Third, more knowledge. The AI needs to add billions of answers, said Prasad, and customers can add answers through “Alexa answers.”
- Learning directly from customers. For example, said Prasad, if Alexa wasn’t able to sing the ABC song upon request, but is able to sing the “alphabet song,” it needs to learn that the ABC song and the alphabet song are the same (and sematically equal).
- More knowledge. Alexa needs to add billions of answers, and customers can add answers through ‘Alexa answers’
- More proactive. Alexa needs to be context aware, and know what goes beyond words and learn about context. For example, Alexa tells you has a “hunch” you left the garage light on and you can turn if off.
- More natural (a key tenet for AI interaction). With so many skills, how do you get the right skill with the right action (For example, “Alexa, have Rumba clean my room” vs. “Alexa, start cleaning”).
- More fun. For example, Alexa is getting celebrities, such as Samuel L. Jackson, to record sayings to playback answers on Alexa.
- More conversational. Alexa needs to anticipate goals, anticipate next moves, and accomplish what you want as a customer. For example, if you want to go to the movies, it can recommend nearby restaurants and call you an Uber.
As for the journey ahead? Prasad said he wants Alexa to be everywhere and available anytime, including on the go with products such as Echo Frames or Echo Ring, as well as available to any developer or student in the world. With all his progress, we certainly won’t have to wait long.