In a major scientific breakthrough, A.I. predicts the exact shape of proteins
Our mission to make business better is fueled by readers like you. To enjoy unlimited access to our journalism, subscribe today.
Researchers have made a major breakthrough using artificial intelligence that could revolutionize the hunt for new medicines.
The scientists have created A.I. software that uses a protein’s DNA sequence to predict its three-dimensional structure to within an atom’s width of accuracy.
The achievement, which solves a 50-year-old challenge in molecular biology, was accomplished by a team from DeepMind, the London-based artificial intelligence company that is part of Google parent Alphabet. Until now, DeepMind was best known for creating A.I. that could beat the best human players at the strategy game Go, a major milestone in computer science.
DeepMind achieved the protein shape breakthrough in a biennial competition for algorithms that can be used to predict protein structures. The competition asks participants to take a protein’s DNA sequence and then use it to determine the protein’s three-dimensional shape. (For an exclusive account of how DeepMind accomplished this goal, read this Fortune feature.)
Across more than 100 proteins, DeepMind’s A.I. software, which it called AlphaFold 2, was able to predict the structure to within about an atom’s width of accuracy in two-thirds of cases and was highly accurate in most of the remaining one-third of cases, according to John Moult, a molecular biologist at the University of Maryland who is director of the competition, called the Critical Assessment of Structure Prediction, or CASP. It was far better than any other method in the competition, he said.
Demis Hassabis, DeepMind’s cofounder and chief executive officer, said the company wants “to make the maximal positive societal impact with these technologies.” But he said DeepMind had not yet determined how it would provide academic researchers with access to the protein structure prediction software or whether it would seek commercial collaborations with pharmaceutical and biotechnology firms. He said the company would announce “further details on how we’re going to be able to give access to the system in a scalable way” sometime next year.
“This computational work represents a stunning advance on the protein-folding problem,” Venki Ramakrishnan, a Nobel Prize–winning structural biologist who is also the outgoing president of the Royal Society, Britain’s most prestigious scientific body, said of AlphaFold 2.
Janet Thornton, an expert in protein structure and former director of the European Molecular Biology Laboratory’s European Bioinformatics Institute, said that DeepMind’s breakthrough opened up the way to mapping the entire “human proteome”—the set of all proteins found within the human body. Currently, only about a quarter of human proteins have been used as targets for medicines, she said. Now, many more proteins could be targeted, creating a huge opportunity to invent new medicines.
Thornton also said that DeepMind’s A.I. system would have profound implications for scientists who create synthetic proteins and that these could have big impacts too: everything from creating new genetically modified crop strains that will be far more nutritious to new enzymes that could help clean up the environment by digesting plastics.
Proteins are the basic mechanisms of biological processes. They are formed from long chains of amino acids, coded for in DNA, but once manufactured by a cell, they fold themselves spontaneously into complex shapes that often resemble a tangle of cord, with ribbons and curlicue-like appendages. The exact structure of a protein is essential to its function. It is also critical for designing small molecules that might be able to bind with the protein and alter this function, which is how new medicines are created.
Until now, the primary way to obtain a high-resolution model of a protein’s structure was through a method called X-ray crystallography. In this technique, a solution of proteins is turned into a crystal, itself a difficult and time-consuming process, and then this crystal is bombarded with X-rays, often from a large circular particle accelerator called a synchrotron. The diffraction pattern of the X-rays allows researchers to build up a picture of the internal structure of the protein. It takes about a year and costs about $120,000 to obtain the structure of a single protein through X-ray crystallography, according to an estimate from the University of Toronto.
More recently, two other experimental methods—nuclear magnetic resonance and cryogenic electron microscopy—have also been used. They can be faster and less expensive but tend to produce models that are less precise than X-ray crystallography.
It takes AlphaFold 2 “a matter of days” to calculate each protein structure using what John Jumper, the researcher who leads the protein-folding team at DeepMind, characterized as “modest” computing resources. Training the system required 128 specialized A.I. computing units on 16 chips created by Google, called tensor processing units, running continuously for “roughly a few weeks,” Jumper said. He noted that this is much less computing power than has been required for many other recent A.I. breakthroughs, including DeepMind’s previous work on Go.
In 1972, Nobel Prize–winning chemist Christian Anfinsen postulated that DNA alone should fully determine what final structure a protein takes—a supposition that set off the decades-long quest to find a mathematical model that could do what Anfinsen was proposing. The problem was, however, that even though the laws of physics control how a protein folds, there are so many possible permutations that biologist Cyrus Levinthal famously estimated it would take longer than the age of the known universe to puzzle out a single protein’s structure through random trial and error.
But DeepMind’s AlphaFold 2 has now essentially done what Anfinsen suggested. AlphaFold 2 is “on par” with X-ray crystallography across more than two-thirds of the proteins in the CASP competition, Moult said. Now the hope is that researchers will be able to use AlphaFold 2, or at least the same method, to go directly from a protein’s DNA sequence, which has become relatively easy and inexpensive to obtain, to knowing its 3D shape, without having to use X-ray crystallography or other physical experiments at all.
Andrei Lupas, director of the department of protein evolution at the Max Planck Institute for Developmental Biology in Tübingen, Germany, who served as one of the assessors for this year’s CASP competition, called DeepMind’s results “astonishing.”
As part of CASP’s efforts to verify the capabilities of DeepMind’s system, Lupas used the predictions from AlphaFold 2 to see if it could solve the final portion of a protein’s structure that he had been unable to complete using X-ray crystallography for more than a decade. With the predictions generated by AlphaFold 2, Lupas said he was able to determine the shape of the final protein segment in just half an hour.
AlphaFold 2 has also already been used to accurately predict the structure of a protein called ORF3a that is found in SARS-CoV-2, the virus that causes COVID-19, which scientists might be able to use as a target for future treatments.
Lupas said he thought the A.I. software would “change the game entirely” for those who work on proteins. Currently, DNA sequences are known for about 200 million proteins, and tens of millions more are being discovered every year. But 3D structures have been mapped for less than 200,000 of them.
AlphaFold 2 was only trained to predict the structure of single proteins. But in nature, proteins are often present in complex arrangements with other proteins. Jumper said the next step was to develop an A.I. system that could predict complicated dynamics between proteins—such as how two proteins will bind to one another or the way that proteins in close proximity morph one another’s shapes.
DeepMind had entered and won the CASP competition two years ago. But at the time, using an A.I. system called AlphaFold that was configured differently, it was only able to achieve an average “global distance test total score” (GDT) —a measure that is approximately equivalent to the percentage of each protein that it accurately maps—of 58 on the hardest class of proteins.
Although this was about six points better than the next best team, it was not a result that was competitive with empirical methods like X-ray crystallography. This year, even on these hardest proteins, DeepMind achieved a median GDT of 87, which is close to being as good as crystallography and was about 26 points better than its nearest competitor.