DeepMind’s new protein-folding A.I. is already helping in the fight against COVID-19
Artificial intelligence has just solved one of biology’s most vexing challenges: how to determine the three-dimensional shape of a protein. DeepMind, the London-based A.I. company that is owned by Google-parent Alphabet, has created an A.I. system it calls AlphaFold 2 that uses a protein’s DNA sequence to predict its folded shape, frequently to within an atom’s width of accuracy. Previously, this could be done only through time-consuming and expensive experiments. (Read more about how DeepMind accomplished this goal here.) In the future, the breakthrough is likely to speed the development of new medicines for everything from malaria to cancer. But AlphaFold 2 is already having an impact on the fight against today’s most pressing global health scourge: the COVID-19 pandemic.
Almost as soon as fears arose that a novel coronavirus was circulating in Wuhan, China, with the potential to become a global pandemic, DeepMind, like many companies, wanted to do something to help. The potential to use AlphaFold 2, DeepMind’s still-under-wraps new A.I. system, to gain insights into the virus was clear. But the system, which was being prepared for its debut in a global competition on protein structure prediction that was still months away, had not yet been fully tested.
The DNA sequence of SARS-CoV-2, the virus that causes COVID-19, had been published by Chinese researchers as early as Jan. 11. This allowed researchers to begin to scrutinize proteins associated with the virus as possible targets for vaccines and treatments. Most of these efforts focused on the coronavirus’s distinctive spike protein, an obvious candidate for medicines since its structure and function were similar to that of other coronaviruses and well understood.
DeepMind’s protein-folding team quickly realized, though, that there were a number of proteins associated with the new virus whose structures were unknown to researchers. If DeepMind could turn AlphaFold 2 loose on these, it might be able to play a small but important part in helping to combat the virus.
John Jumper, the senior research scientist who leads DeepMind’s protein-folding team, describes the process of using AlphaFold 2 on the SARS-CoV-2 proteins as “terrifying.” “We had a system that we had tested internally, and we had numbers internally that said, ‘We think this is good, and we expect it is better than the field,’” he says. “But there was always the possibility that we could be wrong.” Especially worrisome, Jumper says, is that biology researchers at a number of academic labs had also tried to use algorithms to get structures for some of the same SARS-CoV-2 proteins, and their results looked very different from AlphaFold 2’s predictions.
To give themselves some assurance that AlphaFold 2 was on the right track, Jumper says, the team launched a crash program to equip the A.I. system with an internal confidence gauge: In other words, it would ask the A.I. software to produce an assessment of how good it thought its own predictions were for any given part of a protein. DeepMind had always planned to build this feature, Jumper says, since the metric would be crucial if AlphaFold 2 was ever going to be useful to medical researchers and biologists. But DeepMind hadn’t actually gotten around to it yet. “We had decided that we’d do confidence at the end, when we knew the final system. But suddenly it was, Okay, we need it now,” he says. “Very luckily the first thing we tried worked.”
Ultimately, AlphaFold 2 spat out high-confidence predictions for six previously unmapped SARS-CoV-2 proteins. Still, DeepMind remained trepidatious enough about the predictions that it decided to seek advice from virus experts and structural biologists at London’s Francis Crick Institute, one of Europe’s largest and most advanced biomedical research labs. The Crick scientists told DeepMind that AlphaFold 2’s structures seemed plausible and encouraged the company to publish them. It did so in early March.
Kathryn Tunyasuvunakool, the DeepMind research scientist responsible for preparing the training and test data for AlphaFold 2, remembers staring at the protein shapes the A.I. had predicted over and over again the night before they were to be released. “I think it was the most stressful thing I’ve ever done in my career,” she says.
One morning three months later, Tunyasuvunakool woke up to find a message from Jumper. Scientists at the University of California at Berkeley had just used an electron microscope to obtain the structure for one of the six proteins AlphaFold 2 had analyzed: ORF3a, a protein thought to help the virus, once it replicates, to break out of its host cell. The protein may also play a role in triggering the inflammatory response the human immune system mounts in response to the infection, a response that contributes to the physical damage inflicted on patients in severe COVID-19 cases. The structure had just been added to the Protein Data Bank (PDB), a public repository of protein information, overnight.
Tunyasuvunakool says she raced to her laptop and opened up the PDB website. She had spent so long staring at AlphaFold 2’s SARS-CoV-2 predictions when DeepMind was trying to decide whether to publish them, that she didn’t even have to compare each amino acid position. The second she saw the protein, she recalls, she just knew: AlphaFold 2 had pretty much nailed it.
AlphaFold 2’s predictions have even improved since March. Experimental methods have shown it predicted the structure of another SARS-CoV-2 protein, called ORF8, with a high degree of accuracy as well. And DeepMind has released updated predictions for the four other SARS-CoV-2 proteins whose structures have still not been confirmed through empirical methods.
Researchers are now at work on figuring out whether ORF3a and the other proteins AlphaFold 2 predicted structures for could be important for developing treatment or vaccines.
This story has been updated to include the fact that AlphaFold 2 also predicted the structure of the protein ORF8 with a high degree of accuracy.