Speech recognition has been clunky and perennially around-the-corner for years. Now, Siri is poised to vault the technology into mainstream use -- with a wide variety of applications.
peech recognition is nothing new.Consumer electronics, cars and automated call centers have been “listening” to commands for years. Google has been transcribing voicemail messages since 2009, and Microsoft baked similar technology into Windows Vista three years before that. So what’s the big deal about Apple’s new virtual personal assistant named Siri?
She gets you.
In other words, Siri isn’t just voice recognition technology, but voice comprehension — and that’s changing the way users interact with their mobile devices. Now, many predict Siri could provide a major boost to a perennially around-the-corner technology, much the way Apple’s AAPL touch-based iPhone controls vaulted that technology into mainstream use. That could clear the way for a wide range of innovative applications. The voice recognition industry was worth some $2.7 billion this year, according to Opus Research. It is predicting a post-Siri boom in 2012.
What makes Siri so different? Accuracy, according to Tim Bajarin, president of strategy firm Creative Strategies. “What Siri has really introduced is the next man-to-machine interface, and it’s making a significant impact on the market of speech comprehension and accuracy,” Bajarin says.
Siri’s not perfect, of course. The technology still has a hard time understanding some accents, and Apple has scrambled to fix early glitches. But for a piece of software, Siri still does pretty well. The key to that, according to Siri’s original creators, Menlo Park, California-based research lab SRI International, is natural language processing. Essentially, Siri takes speech signals, translates them directly into the text users see on their screens and maps those terms to one of its pre-programmed commands such as place a call or compose a text message.
That technology has potential outside of tablets and smartphones. Nuance NUAN , the creator of Dragon speech recognition software, has been working in healthcare for a decade. Nuance’s latest program runs on a physician’s desktop, recording speech using a clip-on microphone. The program updates patients’ electronic health records as appointments are going on. “One second the patient could be talking about the medical history of their mom, and then the next they’re talking about their dad. And the application understands that,” says Joe Petro, senior vice president of research and development at Nuance Communication’s health care division.
How? Much like Siri, Nuance’s application — which is being used by some 450,000 physicians across the country – extracts meaning from the words it recognizes, referencing a database of medical information and comparing that with the patient’s history. It then uses statistical inference to establish a connection between the pieces of information it discovers, even making suggestions about treatment. Petro says the technology is more than 90% accurate and improves over time. It’s certainly worked for the bottom line, so much so that Nuance decided to raise its fourth-quarter revenue projections by about $10 million.
Researchers have even bigger hopes for the future. Skip Rizzo, associate director of the University of Southern California’s Institute for Creative Technologies, is working on an interactive simulation technology designed to help military veterans seek counseling for post-traumatic stress disorder. Dubbed SimCoach, the program will eventually attempt to read the emotion behind spoken words. “It’s a big, big challenge. Because what you’re doing is having to capture vocal patterns, then you’re having to analyze them like a brain does,” says Rizzo. While humans may be able to tell when something is wrong with a close friend or family member because their speech pattern is slower or has less emphasis, a computer can have a hard time picking up these signals, Rizzo says.
Some research could bring results sooner, rather then later. Last spring, Rizzo’s research partner, MIT Professor Alex Pentland, experimented with a similar voice inference technology at a Bank of America BAC call center, analyzing how employee communication affected the success of the business. Pentland had employees wear small electronic badges around their necks for six weeks that tracked their physical location and well as body language and voice. The data showed who a person interacted with, how close they were standing to them and the tone of their conversation. “We found that the most productive people were the people that not only talked to lots of people but they talked to co-workers that similarly talked to a lot of people,” Pentland says. Simply by changing the employee’s coffee break schedule to better coincide with one another, he says the call center would be able to save $15 million a year.
The attention consumers are paying to Siri is likely to benefit such research — and push adoption further. “Voice recognition is really the holy grail to technology,” Rizzo says. “We’re 90% there, but that last 10% is a lot further to handle. And when the tipping point is reached, it’s going to be a giant market.” It looks like Siri, may very well be that tipping point.