LAS VEGAS — A short video of a child asking Alexa to have his grandma read the Wizard of Oz is generating more than its share of buzz after Amazon explained that the voice was actually a simulation.
To make it happen, Amazon “had to learn to produce a high-quality voice with less than a minute of recording versus hours of recording in a studio,” said Rohit Prasad, Alexa AI senior vice president and head scientist, during a keynote address Wednesday morning at the company’s re:MARS conference in Las Vegas.
Amazon isn’t saying when the capability could be released as an Alexa feature, or what types of precautions might accompany a rollout. AI-generated synthesized voices have raised concerns about the potential for abuse, possibly being misused to trick people into thinking others had said something they hadn’t.
A set of “Responsible AI” principles released this week by Microsoft outlines a series of safeguards against potential misuse of synthetic voices.
Prasad hinted at the underlying technical approach used by Amazon for the capability, saying that the key was “framing the problem as a voice-conversion task, and not a speech-generation task.”
Introducing the demo, he described the technology as a way to preserve the memories of loved ones, saying it’s part of a growing recognition of the role that Alexa can serve as a companion.
“In this companionship role, human attributes of empathy and affect are key for building trust,” he said. “These attributes have become even more important in these times of the ongoing pandemic, when so many of us have lost someone we love. What AI can’t eliminate that pain of loss, it can definitely make their memories last.”
Prasad called it “one of the new capabilities we’re working on, which enables lasting personal relationships.”