DeepMind, the London-based A.I. research company that is owned by Google-parent Alphabet, has created an artificial intelligence algorithm that can perform a wide range of language tasks—from reading comprehension to answering questions on a broad range of subjects—better than any existing similar software. In a few areas, such as a high school reading comprehension test, the software approaches human-level performance. But in others, including common sense reasoning and mathematical reasoning, the system fell well short of human abilities.
In announcing the new language model Wednesday, DeepMind signaled its intent to play a larger role in advancing natural language processing. The company is best known for creating an A.I. system that could beat the world’s top human player in the strategy game Go, a major milestone in computer science, and it recently achieved a breakthrough in using A.I. to predict the structure of proteins. But DeepMind has done far less work on natural language processing (NLP) than rival labs, such as OpenAI, the San Francisco-based A.I. research company, and the A.I. research arms of Facebook, Microsoft, Alibaba, Baidu, and even its sister company Google.
Each of these other companies have built extremely large language A.I. systems. Based on neural networks—a kind of machine learning software loosely modeled on the human brain—these language A.I. algorithms can ingest and tinker with hundreds of millions, and sometimes hundreds of billions, of variables. Known among A.I. researchers as “ultra-large language models,” these A.I. systems are trained on vast repositories of books and text culled from the Internet. The advantage of such ultra-large language models is that, once trained, they can perform many different language skills, such as translation and question-answering and text composition, with little, or sometimes no, specific training in that domain. What’s more, they frequently outperform smaller NLP A.I. software that has been trained specifically to be expert at just one task.
DeepMind’s language model, which it calls Gopher, was significantly more accurate than these existing ultra-large language models on many tasks, particularly answering questions about specialized subjects like science and the humanities, and equal or nearly equal to them in others, such as logical reasoning and mathematics, according to the data DeepMind published.
This was the case despite the fact that Gopher is smaller than some ultra-large language software. Gopher has some 280 billion different parameters, or variables that it can tune. That makes it larger than OpenAI’s GPT-3, which has 175 billion. But it is smaller than a system that Microsoft and Nivida collaborated on earlier this year, called Megatron, that has 535 billion, as well as ones constructed by Google, with 1.6 trillion parameters, and Alibaba, with 10 trillion.
Ultra-large language models have big implications for business: they have already lead to more fluent chatbots and digital assistants, more accurate translation software, better search engines, and programs that can summarize complex documents. DeepMind, however, said it had no plans to commercialize Gopher. “That’s not the focus right now,” said Koray Kavukcuoglo, DeepMind’s vice president of research.
Some researchers, including several at OpenAI, have posited that because most human knowledge is encoded in language, by building larger and larger language models, scientists will eventually achieve “artificial general intelligence.” That’s the term computer scientists use for A.I. whose intelligence is as adaptable as a human’s—able to compose a symphony and win Jeopardy! and streamline a factory’s operations—and at least as capable, if not more so.
Along with OpenAI, DeepMind is one of the few companies to have artificial general intelligence as its explicit mission. But its researchers said they did not necessarily buy the idea that ever-larger language models will yield human-like intelligence. “Even though we don’t think that language or scale is the only way to go, we see it as part of a broader portfolio of approaches that we’re studying at DeepMind towards achieving our mission of solving intelligence to advance science and benefit humanity,” said Oriol Vinyals, a DeepMind researcher who worked on the company’s new language A.I. system.
As ultra-large language models are rapidly being commercialized, A.I. researchers and social scientists have raised ethical concerns about them. Chief among them are concerns that the models often learn racial, ethnic and gender stereotypes from the texts on which they are trained and that the models are so complex it is impossible to discover and trace these biases prior to deploying the system. In one example, GPT-3, the language A.I. that OpenAI built, often associates Muslims with violent narratives and regurgitates professional gender stereotypes.
Another major concern about such software is that the algorithms consume outsized amount of electric power to train and run, potentially exacerbating global warming. In light of these harms and the fact that, despite their size the A.I. systems still don’t achieve human-level language understanding, some linguistics and A.I. researchers have called on tech companies to stop building them.
In conjunction with releasing its Gopher research, DeepMind tried to inoculate itself against this criticism by also publishing a research paper in which its own A.I. ethics team delved into these concerns and tried to suggest ways to address them. “Our belief is that language model research does have the potential to unlock a range of beneficial uses, if done in a responsible way,” said Iason Gabriel, a DeepMind A.I. ethics researcher who worked on the paper. “We take this this idea of responsible innovation seriously.”
The researchers broke the 21 ethical concerns they identified into six broad categories, ranging from malicious uses of the software to its potential ability to spread misinformation to environmental harms. But the DeepMind ethics team said there was no silver bullet for fixing many of the issues ultra-large language models pose. “One key takeaway we’ve had is that it really does take collaboration of different kinds of expertise to unearth this whole landscape [of potential ethical problems],” said Laura Weidinger, another member of the DeepMind A.I. ethics team.
To show that it was trying to make progress on addressing some of the ethical harms that ultra-large language models pose, DeepMind also published separate research on a technique that could make the creation of large language models more energy efficient and also potentially make it easier for researchers to detect bias and toxic language as well as verify information sources. The system, which the company called a Retrieval-Enhanced Transformer, or Retro for short, has access to a 2 trillion word database, which the software uses a kind of memory.
When asked to generate text to complete a human-written prompt, the system finds the passage from its training set that was closest to that initial prompt, finds the next closest text block, and then uses those two passages to inform its response.
Having access to this kind of memory reduces the amount of information that the language model has to process at any one time. That, in turn, means that the model can be smaller, taking less energy to train and run. DeepMind said that with a 7 billion parameter Retro model it could equal the performance of OpenAI’s GPT-3, even though GPT-3 is 25 times larger. Also, because researchers can see exactly which section of training text the Retro software is using to produce its output, it could be easier to detect bias or misinformation, the DeepMind researchers said.
Correction, Dec. 8: An earlier version of this story misspelled the last name of DeepMind’s vice president for research, Koray Kavukcuoglo. This story has also been updated to clarify how the Retro A.I. system uses the text its retrieves from its training set.
Subscribe to Fortune Daily to get essential business stories straight to your inbox each morning.