
Voice-to-text technology is a great way for many companies to automate interaction. And avoid doing that task in person. After all, nothing frustrates an introvert more than actually having to speak to customers and nothing frustrates customers more than needing to speak to a robot.
So, the trick to making both of these opposites sides of the equation happy is to somehow make voice-to-text technology sound a lot more human and less robotic. This is especially true when working through a large piece of text which a robotic voice is terrible at delivering, as they race through these long sentences without needing to breathe and making it abundantly clear that a customer is left at the mercy of said robot rather than the human who they were hoping to speak to even if they may prove just as useless.
Amazon has announced a new speaking style for their Alexa AI, along with 10 new voices, that will cater for long-form text. Amazon says the long-form style is “powered by a deep-learning text-to-speech model,” and allows Alexa-voiced devices to speak with more natural conversational pauses. It follows last year’s release of new speaking styles for news and music content and a November update that allows Alexa to seem “disappointed” or “excited”, which shows a concerted effort by the company to try and make the AI sound as human as possible. The company has revealed some text of the new speaking style in action:
Here’s the usual “neutral” Alexa speaking style.
Here’s Alexa speaking in longform style.
To be honest, this new long-form style still sounds incredibly robotic and Amazon still has a long way go before they fool everyone into realising they are speaking to an AI instead of a human. A few extra pauses here and there do not a convincing human voice make. Still, it’s amazing how far this technology has come and at this pace, gives us faith that by 2035 they may finally have been able to master the South African accent.
Last Updated: April 21, 2020