If you’re playing around with Google’s new Home Max or Mini smart speakers, or if you’re just using an Android phone such as the new Pixel 2, you may be familiar with the Google Assistant virtual helper. And if you’ve done so in the last couple days, you may have noticed that the virtual assistant’s voice is sounding more realistic than before.
That’s because Alphabet’s Google has started using a cutting-edge piece of technology called WaveNet—developed by its DeepMind “artificial intelligence” division—in Google Assistant.
Synthesized speech is traditionally created by gluing together bits of recorded speech, in a technique known as “concatenative text-to-speech.” The result does not sound natural, although some versions of the technique are better than others.
WaveNet represents a different approach that uses recordings of real speech to train a neural network—a computer model that simulates a brain of sorts. Then, using a statistical approach based on what it’s learned, the system generates whole new waveforms rather than pumping out a pastiche.
DeepMind revealed its WaveNet technology just over a year ago, but at the time it was just a clever research prototype. Now it’s ready for consumer applications such as Google Assistant—according to a Wednesday blog post, this is largely because WaveNet’s model has been revamped to allow it to generate waveforms faster, while using less computing power.
To hear the difference between WaveNet-generated “voices” and their concatenative predecessors, listen to the samples embedded in that blog post. The difference is quite stark. However, Assistant is for now only using the new technology for its U.S. English and Japanese voices.
Meanwhile, DeepMind also said this week that it was launching a new “Ethics & Society” division to work alongside its AI activities. It said the new unit would work on making artificial intelligence “beneficial and responsible”—a topic that’s worrying many people inside and outside the tech community these days.