Google improves voice search (again) for its mobile apps

Facebook And Other Apps For iPhone And HTC Mobile Handsets
The Google Inc. company logo is seen on an Apple Inc. iPhone 4 smartphone in this arranged photograph in London, U.K., on Wednesday, Aug. 29, 2012. Apple Inc. is seeking a U.S. sales ban on eight models of Samsung Electronics Co. smartphones and the extension of a preliminary ban on a tablet computer after winning a patent trial against the South Korean company. Photographer: Chris Ratcliffe/Bloomberg via Getty Images
Chris Ratcliffe — Bloomberg via Getty Images

Google (GOOG) is claiming better voice search on its Android and iOS mobile apps, thanks to a new approach to the artificial intelligence technique the company uses to power that capability. A blog post published on Thursday, authored by a handful of Google researchers, explains in technical detail how they pulled off the improvements, which include faster, more-accurate transcriptions and better voice recognition in noisy places.

The boiled-down version is that Google switched its voice search system from one type of deep learning technique to another. In the old model, the system would analyze 10-millisecond snippets of audio and make predictions of words based on the sounds it recognized, regardless of the order in which they were uttered. The new model has a better memory, meaning it can consume larger snippets of audio and concern itself with the order in which particular sounds were spoken.

Here’s a more-technical, but illustrative explanation from the Google post:

If the user speaks the word “museum” for example—/m j u z i @ m/ in phonetic notation—it may be hard to tell where the /j/ sound ends and where the /u/ starts, but in truth the recognizer doesn’t care where exactly that transition happens: All it cares about is that these sounds were spoken.

Our improved acoustic models rely on Recurrent Neural Networks (RNN). RNNs have feedback loops in their topology, allowing them to model temporal dependencies: when the user speaks /u/ in the previous example, their articulatory apparatus is coming from a /j/ sound and from an /m/ sound before. Try saying it out loud – “museum” – it flows very naturally in one breath, and RNNs can capture that.

Google’s voice recognition team also added ambient noise and reverb to the the data it used to train its new system, meaning it does a better job understanding users trying to talk to their phones while in noisy places.

It’s all very complicated stuff from a computer science perspective, but is increasingly important to our everyday lives as we expect everything from our phones to our cars to be more intelligent. The techniques Google uses to power voice search in Android are related to what Apple (AAPL) is doing with Siri, what Microsoft (MSFT) with its Cortana digital assistant and what Amazon (AMZN) is doing with its various voice-controlled devices. They’re also related to techniques that allow software to recognize objects, faces and even our body movements.

If you want to learn more about how deep learning, the umbrella term for this collection of techniques, works, read Fortune‘s recent interview with Andrew Ng, the chief scientist at Chinese search engine giant Baidu (BIDU)and a renowned expert in the space.

To learn more about machine learning, watch this Fortune video:

Sign up for Data Sheet, Fortune’s daily newsletter about the business of technology.

Subscribe to Well Adjusted, our newsletter full of simple strategies to work smarter and live better, from the Fortune Well team. Sign up today.

Read More

Artificial IntelligenceCryptocurrencyMetaverseCybersecurityTech Forward