Speech recognition has progressed from an interesting-but-glitchy technology to an extremely useful tool in the last 5 years (Thanks Siri!) But now, building on advances in cloud computing, machine learning, and neural networks, there is software that not only recognizes what we say, but translates it into other languages.
Yup, human translators may be the next set of workers to be automated out of jobs. You can already use Microsoft’s Skype Translator to initiate a call in English and get it translated pretty much in real-time, to Chinese, French, German, Italian, or Spanish. The product, which has been available in a trial version since October 1, also understands and translates instant messages in 50 languages.
For many people (well, those who aren’t translators anyway) this is good news, and for long-time Microsoft researcher Fil Alleva it’s the culmination of a dream.
Back when he was starting out “what we all had in the back of our minds, whether we say it or not, was C-3PO,” said Alleva, group engineering manager for Microsoft (MSFT) Technology & Research in a new blog post.
In case you’ve been under a rock for the last three decades, C-3PO is the helpful-to-a-fault robot from the Star Wars franchise. One of his big jobs was translating information to and from a seemingly infinite number of human and machine languages. What better role model for speech translation technology?
Here’s why it’s now possible to do what has long seemed like the stuff of science fiction.
First, there’s the broad availability of affordable and massive computing resources residing in public clouds like Microsoft Azure, Amazon (AMZN) Web Services and Google (GOOG) Cloud Platform. That gives researchers near-infinite compute capacity and storage to suck up data and crunch it. That data includes sounds, words, text, what-have-you.
Then there are advances in machine learning, aka artificial intelligence, which enables computers to teach themselves capabilities based on the data they are exposed to and learn based on the data they parse over time. One reason Skype Translator is in preview is to give it more fodder to work with. The more it’s used, the more voices/words it’s exposed to, the more accurate it gets.
Microsoft emphasizes its use of “neural network” technology that mimics the human brain’s ability to take in data, process it, compare it to other information over time, and learn from it. Yes, massive banks of computers can now teach themselves stuff, including how context can affect the meaning of a word.
This is all extremely complex stuff happening, hopefully under the covers, so that humans don’t have to think about it much, they can just use it. As Harry Shum, executive vice president of Microsoft’s Technology and Research Group, put it in the post: “When machine learning works at its best, you really don’t see the effort. It’s just so natural. You see the result.”
Microsoft is certainly not alone in this race —which is starting to look more like a sprint than the marathon it’s been to date—to make computers that see, hear, and understand input whether it’s spoken words, text, facial images, whatever.
Google on Wednesday offered a new tool to let developers build image- and facial-recognition smarts into software and devices; IBM (IBM) recently made its SystemML machine language technology broadly available. Microsoft’s Project Oxford is an effort to build technologies to help software see, hear and understand the world (and the people in it) better. And then there’s Apple (AAPL) Siri and Amazon Echo on the speech recognition front as well.
But, with all due respect to Skype Translator, it is nowhere near as cute as C-3PO.
For more from Shum, check out this Datacenter Podcast.
For more on artificial intelligence, check out this Fortune video.
And please subscribe to Data Sheet, Fortune’s daily newsletter on the business of technology.
This story was updated at 11:45 a.m. with more information on the Skype Translator preview.