For the last five years the effort to teach computers to think more like humans, to learn how to recognize speech and images on their own has been the goal of deep learning. But now tech giants and startups in the industry are turning to new tools believing that deep learning has essentially solved its recognition problem.
In short, computers have figured out facial recognition and how to recognize what you’re saying. They will continue to get better over time, but the algorithms and basic research challenges have mostly been fixed. Microsoft (MSFT) acknowledged as much in regards to natural language recognition in a blog post on Thursday. The new challenge? Helping computers learn to achieve a goal once it has figured out what it is looking at.
This requires two skill sets. Memory and mimicry. Google (GOOG) and Facebook both have made strides in this area throughout the last year or so, and revealed new products that showcased some of their breakthroughs in that area. Facebook (FB) this fall, disclosed that its Memory Nets are helping computers learn like babies do, while Google has come up with a different way to teach computers how to add a sense of time (or memory) to a situation and solve problems. Google recently used these new tools to make its translation products better and create new services, such as its Smart Reply tool that offers up replies for your mobile emails. Google also bought a company called DeepMind that was at the forefront of these efforts.
Meanwhile, on Wednesday a new machine learning startup called Osaro launched with $3.3 million in funding from Peter Thiel, Scott Banister, and Yahoo’s Jerry Yang. The company’s plan is to take what’s called deep reinforcement learning out of research labs and into production.
The idea here is that once a computer can look at a mass of pixels and determine that mass is, for example, a cat without human intervention, what does it do with that information? Or, in the case of your emails, once it knows that a certain jumble of words is asking for an appointment next Thursday, what sorts of replies can it offer?
As Osaro cofounder Derik Pridmore puts it, computers need a policy in order to figure out what comes next. For now, engineers craft that policy or goal by hand, but that’s just not scalable. This is where something called deep reinforcement learning comes in.
DeepMind, that startup Google recently bought, is the most famous example of deep reinforcement learning in action. The company made waves earlier in the year after teaching its artificial intelligence how to play video games. Teaching the company’s software to recognize what was on a video game screen and then to take actions based on what it encountered required tens of thousands of hours of work, which included the computer taking random actions, analyzing its score and then “learning” what actions made the score go up so it could repeat it over and over again.
Building that type of computer knowledge has been the domain of R&D and most companies leave it there. But Pridmore doesn’t plan to do that with his company. His plan is to build within the year a production system to teach robots how to recognize situations and then take action based on what it sees. “Right now, you might spend $100,000 on a robot that can do one thing and when you want it to do something else you have to retool it,” says Pridmore. “In the future, you will buy your robot and an unskilled or a skilled technicians can show it what to do—or maybe even another robot—and it gets to work.”
That turns Osaro into the provider of more than just algorithms, but something akin to an operating system for industrial robotics. Any robot running the software could theoretically be shown how to perform a task and “learn” from it. What’s even better is the software could use that learned knowledge to then adapt to changes in its environment, within, of course, certain learned parameters. The key is that those parameters are learned, not programmed, which means the robot would be able to adapt to different situations, and that’s pretty significant, especially in a factory setting where conditions can change in unexpected ways.
But to make that happen Osaro needed one more element to fall into place. DeepMind learned to play video games by randomly taking any action it could. This may be fine for video games, but in the real world it could be expensive, time-consuming, and even deadly to have a robot trying to learn by trying every possible action to see what generated a “reward” and what didn’t. So Osaro also wrote algorithms that helps computers learn by mimicking humans.
Osaro isn’t the only group interested in this. DeepMind/Google is, obviously, but so is the University of Washington, which this week released research showing robots learning like children do. The way researchers jump started this type of learning was by teaching robots how to identify a goal and then rewarding them for following humans so they could meet it.
This is called reinforcement learning, and it, plus adding the memory needed for that style of deep learning, will be the next generation of artificial intelligence, according to Sumit Sanyal, CEO of a startup called Minds.ai, a startup that is building hardware for deep learning. “… reinforcement learning is the bleeding edge of this stuff,” he says. “This is the path from weak AI to strong AI.”
For more on artificial intelligence, watch this Fortune video:
Make sure to sign up for Data Sheet, Fortune’s daily newsletter about the business of technology.