protein folding competition seeks next big breakthrough
“In one way or another, the problem is solved,” declared computational biologist John Moult in late 2020. The London-based company DeepMind had just celebrated a biennial competition co-founded by Moult that tests teams’ abilities to predict protein structures—one of biology’s greatest challenges — with the revolutionary artificial intelligence (AI) tool AlphaFold.
Two years later, Moult’s competition, Critical Assessment of Structure Prediction (CASP), is still in AlphaFold’s long shadow. Results from this year’s edition (CASP15) – unveiled this weekend at a conference in Antalya, Turkey – show that the most successful approaches to predicting protein structures from their amino acid sequences incorporated AlphaFold, which relies on an AI approach called deep learning. “Everybody uses AlphaFold,” says Yang Zhang, a computational biologist at the University of Michigan in Ann Arbor.
Still, AlphaFold’s progress has opened the floodgates to new challenges in protein structure prediction – some included in this year’s CASP – that may require new approaches and more time to fully tackle. “The low-hanging fruit has been picked,” says Mohammed AlQuraishi, a computational biologist at Columbia University in New York City. “Some of the next issues are going to be more difficult.”
CASP began in 1994, aiming to bring rigor to the field of protein structure prediction—advances that would accelerate efforts to understand the building blocks of cells and advance drug discovery. During the year of a competition, teams are tasked with using computational tools to predict the structures of proteins that have been determined using experimental methods such as X-ray crystallography and cryo-electron microscopy, but have not yet been published.
Entries are evaluated according to how well predictions for whole proteins, or independently folded subunits called domains, match the experimental structures. Some of AlphaFold’s predictions at CASP14 were more or less indistinguishable from the experimental models—the first time such accuracy had been achieved.
Since its unveiling at CASP14, AlphaFold has become ubiquitous in life science research. DeepMind released the software’s underlying code in 2021 so that anyone could run the program, and an AlphaFold database updated this year contains predicted structures – of varying quality – for nearly every protein from every organism represented in genome databases, totaling more than 200 million proteins .
AlphaFold’s success and newfound ubiquity presented a challenge for Moult, who is at the University of Maryland, Rockville, and his colleagues as they planned this year’s CASP. “People say, ‘Oh, we don’t need CASP anymore, the problem was solved.’ And I think it’s completely the wrong way.”
At CASP15, the most successful teams were those that had adapted and built on AlphaFold in various ways, leading to modest gains in predicting the shape of individual proteins and domains. “Accuracy is already so high that it’s hard to get much better,” says Moult.
To make the competition more relevant in a post-AlphaFold world, Moult and his team added new challenges and tweaked some existing ones. New tests include determining how proteins interact with other molecules such as drugs and predicting the many shapes that some proteins can assume. In the past decade, CASP has included “complexes” of multiple interacting proteins, Moult says, but accurately predicting the structure of such molecules has gained extra weight this year.
“It’s the right thing to do,” Zhang says, because predicting the structures of single proteins or domains—the bread and butter of previous CASPs—has largely been solved by AlphaFold. Determining the shape of protein complexes in particular represents an important new challenge for the field, because there is much room for improvement, says Arne Elofsson, a protein bioinformatician at Stockholm University.
AlphaFold was originally designed to predict the shape of individual proteins. But within days of its public release, other researchers showed that the software could be “hacked” to model how multiple proteins interact. In the months since then, researchers have come up with countless approaches to improve AlphaFold’s ability to deal with complexes. DeepMind even released an update called AlphaFold-Multimer with that goal in mind.
Such efforts seem to have paid off, because CASP15 saw a marked increase in the number of accurate complexes, compared to previous competitions, mainly due to methods adapted to AlphaFold. “It’s a new game for us to be close to experimental accuracy with complexes.” says Moult. “We also have some faults.”
For example, teams made astonishingly accurate predictions of a viral molecule of unknown function consisting of two identical intertwined proteins. This kind of shape confused pre-AlphaFold tools, says Ezgi Karaca, a computational structural biologist at the Izmir Biomedicine and Genome Center in Turkey, who assessed the complex predictions. The standard version of AlphaFold failed to model the shape of a giant, 20-chain bacterial enzyme, but some teams predicted the protein’s structure using extra hacks on the network, Karaca adds.
Meanwhile, the teams struggled to predict complexes involving immune molecules called antibodies — including several linked to a SARS-CoV-2 protein — and related molecules called nanobodies. But there were glimmers of success in some teams’ predictions, Karaca says, suggesting that AlphaFold’s hacks will be useful in predicting the shape of these medically important molecules.
This year’s CASP was also notable for the absence of DeepMind. The company did not give a reason for not participating, but released a brief statement during CASP15 congratulating the teams that did. (At the same time, it rolled out an update to AlphaFold to help researchers measure their progress against the network.)
Other researchers say the competition is a significant time commitment, which the company may have felt was better spent on other challenges. “It would have been nice for us if they had participated,” says Moult. But he adds that “because the methods are so good, they couldn’t make another big leap”.
Making major improvements to AlphaFold will take time, researchers say, and will probably require new innovations in machine learning and protein structure prediction. An area under development is the use of ‘language models’, such as those used in predictive text tools, for the prediction of protein structures. But those methods—including one developed by social networking giant Meta—didn’t perform nearly as well at CASP15 as tools based on AlphaFold.
However, such tools can be useful for predicting how mutations change a protein’s structure—one of several key challenges in protein structure prediction that have emerged as a result of AlphaFold’s success. Thanks to this, the field is no longer focused on a single goal, says AlQuraishi. “There’s a whole range of these issues.”