DeepMind, the artificial intelligence (AI) unit of Google owner Alphabet, is trying to find out whether AIs can learn how to cheat.
The research is potentially important because of the fears of many—such as Elon Musk and Stephen Hawking—that AIs could eventually end up turning on us, taking over the world and/or killing us when they get smart enough.
Of course, there are plenty of people who think such fears are overblown, or who dismiss the very idea of an “intelligence explosion,” but DeepMind clearly thinks the problem is at least worth addressing.
According to Bloomberg, it is doing so through a test that involves running AI algorithms in simple, two-dimensional, grid-based games.
What we call AI these days is really based on a concept called machine learning, where algorithms can learn how to do things without being shown how to do them—they effectively teach themselves how to evolve, in order to achieve a goal set by their creator.
DeepMind’s test is designed to see if, in the process of self-improvement, its algorithms can end up straying from the safety of their tasks.
There are three goals to this research: finding out how to “turn off” AIs if they start to become dangerous; preventing unintended side-effects arising from their main task; and making sure agents can adapt when testing conditions vary from their training conditions.
DeepMind also performed research earlier this year together with the Musk-backed OpenAI initiative, in which the two teams came up with an algorithm that could figure out what humans want “by being told which of two proposed behaviors is better.”
The big worry is that people tell an AI to do something and it comes up with some monstrous way to achieve the goal (the famous example here is Nick Bostrom’s “paperclip maximizer” thought experiment). So, in order to keep AI safe, there may be something to be said for a little human feedback along the way.