Applying and improving <scp>AlphaFold</scp> at <scp>CASP14</scp>

Jumper, John; Evans, Rhett; Pritzel, Alexander; Green, Tim; Figurnov, Michael; Ronneberger, Olaf; Tunyasuvunakool, Kathryn; Bates, Russ; Žídek, Augustin; Potapenko, Anna; Bridgland, Alex; Meyer, Clemens; Kohl, Simon A. A.; Ballard, Andrew J.; Cowie, Andrew; Romera-Paredes, Bernardino; Nikolov, Stanislav; Jain, Rishub; Adler, Jonas; Back, Trevor; Petersen, Stig; Reiman, David; Clancy, Ellen; Zieliński, Michał; Steinegger, Martin; Pacholska, Michalina; Berghammer, Tamas; Silver, David; Vinyals, Oriol; Senior, Andrew W.; Kavukcuoglu, Koray; Kohli, Pushmeet; Hassabis, Demis

doi:10.1002/prot.26257

Cited by 289 publications

(203 citation statements)

References 24 publications

(35 reference statements)

Supporting

Mentioning

201

Contrasting

Order By: Relevance

“…We hypothesized that the predicted IDRs with high pLDDT scores might manifest for one or a combination of reasons: (1) global amino-acid sequence differences in comparison to the predicted IDRs with low pLDDT scores, (2) relatively high positional sequence conservation (i.e., “high quality” multiple sequence alignments (MSA)), and (3) the enrichment of high-pLDDT IDR sequences in the PDB. The first possibility would reflect a differential “folding propensity” that is inherently encoded in the amino-acid sequences of high vs. low pLDDT-scoring IDRs, whereas the latter two possibilities would influence the AlphaFold2 prediction confidence due to the depth of the MSAs (2) or sequence similarity to the structures from the PDB used in training (3) (Jumper et al 2021a,b). Given the relatively poor coverage of IDRs in the PDB (Quaglia et al 2021) and the poor positional alignability for most IDRs (Colak et al 2013; Nguyen Ba et al 2012; Zarin et al 2019, 2021), it is plausible that some combination of all three of the aforementioned possibilities could contribute to high pLDDT scoring IDRs.…”

Section: Resultsmentioning

confidence: 99%

“…The biennial Critical Assessment of Structure Prediction (CASP) competition (Moult et al 1995) has stimulated many developments in the field of protein structure prediction, including the successful implementation of co-evolutionary restraints derived from multiple sequence alignments (MSAs) and machine learning protocols in CASP12 (Moult et al 2018; Schaarschmidt et al 2018). CASP14 brought a revolutionary advancement: the AlphaFold2 team at DeepMind produced more models with atomic-level accuracy than ever before in the history of CASP (AlQuraishi 2021; Jumper et al 2021a,b). The second-best scoring prediction software in CASP14 led to the RoseTTAFold structure prediction platform, which was released in open-source format and contained a webserver for ease of access (Baek et al 2021).…”

Section: Introductionmentioning

confidence: 99%

“…Here, we show that thousands of IDRs are predicted by AlphaFold2 to be folded with high (70 ≤ x < 90) or very high (≥ 90) predicted local difference distance test (pLDDT) scores (Mariani et al 2013), which measure the confidence in the predicted structures (Jumper et al 2021b). We find that, compared to IDRs with low pLDDT scores, the amino-acid sequences of IDRs with high pLDDT scores are enriched in charged and hydrophobic residues, show more positional conservation, and have more alignment matches to sequences in the PDB.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Systematic identification of conditionally folded intrinsically disordered regions by AlphaFold2

Alderson

Pritišanac

Moses

et al. 2022

Preprint

View full text Add to dashboard Cite

Deep learning-based approaches to protein structure prediction, such as AlphaFold2 and RoseTTAFold, can now define many protein structures with atomic-level accuracy. The AlphaFold Protein Structure Database (AFDB) contains a predicted structure for nearly every protein in the human proteome, including proteins that have intrinsically disordered regions (IDRs), which do not adopt a stable structure and rapidly interconvert between conformations. Although it is generally assumed that IDRs have very low AlphaFold2 confidence scores that reflect low-confidence structural predictions, we show here that AlphaFold2 assigns confident structures to nearly 15% of human IDRs. The amino-acid sequences of IDRs with high-confidence structures do not show significant similarity to the Protein Data Bank; instead, these IDR sequences exhibit a higher degree of positional amino-acid sequence conservation and are more enriched in charged and hydrophobic residues than IDRs with low-confidence structures. We compared the AlphaFold2 predictions to experimental NMR data for a subset of IDRs known to fold under specific conditions, finding that AlphaFold2 tends to capture the folded state structure. We note, however, that these AlphaFold2 predictions cannot detect functionally relevant structural plasticity within IDRs and cannot offer an ensemble representation of IDRs. Nevertheless, AlphaFold2 assigns high-confidence scores to about 60% of a set of 350 IDRs that have been reported to conditionally fold, suggesting that AlphaFold2 has learned to identify conditionally folded IDRs, which is unexpected, since IDRs were minimally represented in the training data. Leveraging this ability to discover IDRs that conditionally fold, we find that up to 80% of IDRs in archaea and bacteria are predicted to conditionally fold, but less than 20% of eukaryotic IDRs. Our results suggest that a large majority of IDRs in the proteomes of human and other eukaryotes would be expected to function in the absence of conditional folding.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Systematic identification of conditionally folded intrinsically disordered regions by AlphaFold2

Alderson

Pritišanac

Moses

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Moreover, only one of the five multimer DL models generated such a model by using unpaired MSAs. It appears that using unpaired MSAs may be the key for generating this good model, because runs with paired MSAs by following the AF-Multimer workflow return unphysical models with severe clashes, a phenomenon akin to “chain collapse” observed previously[3]. Paired MSAs may have contributed to the issue.…”

Section: Supplementary Informationmentioning

confidence: 86%

Predicting direct physical interactions in multimeric proteins with deep learning

Gao

Parks

et al. 2021

Preprint

View full text Add to dashboard Cite

Accurate descriptions of protein-protein interactions are essential for understanding biological systems. Very recently, AlphaFold2 has been shown to be remarkably accurate for predicting the atomic structures of individual proteins. Here, we demonstrate that the same neural network models developed for AlphaFold2 can be adapted to predict the structures of multimeric protein complexes without retraining. In contrast to common approaches that require paired multiple sequence alignments, our method, AF2Complex, works without using such paired alignments. It achieves higher accuracy than complex strategies that combine AlphaFold2 and protein-protein docking. New metrics are then introduced for predicting direct protein-protein interactions between arbitrary protein pairs. The approach is successfully validated on some challenging CASP14 multimeric targets, a small but appropriate benchmark set, and the E. coli proteome. Lastly, using the cytochrome c biogenesis system as an example, we present high-confidence models of three sought-after assemblies formed by eight members of this system.

show abstract

“…In the 2020 Critical Assessment of Protein Structure Prediction (CASP14), the DeepMind AlphaFold2 (AF2) deep learning method (Jumper et al, 2021a; Jumper et al, 2021b) demonstrated outstanding performance in blind predictions of protein structure, delivering excellent structural matches to experimental models derived from X-ray crystallography, NMR and cryoEM data, over a wide range of target difficulty (Kryshtafovych et al, 2021). These AlphaFold2 model predictions had an unprecedented high accuracy, assessed by backbone atomic coordinate global distance test (GDT_TS) scores.…”

Section: Introductionmentioning

confidence: 99%

AlphaFold Models of Small Proteins Rival the Accuracy of Solution NMR Structures

Tejero

Huang

Ramelot

et al. 2022

Preprint

View full text Add to dashboard Cite

Recent advances in molecular modeling using deep learning have the potential to revolutionize the field of structural biology. In particular, AlphaFold has been observed to provide models of protein structures with accuracy rivaling medium-resolution X-ray crystal structures, and with excellent atomic coordinate matches to experimental protein NMR and cryo-electron microscopy structures. Here we assess the hypothesis that AlphaFold models of small proteins have accuracies (based on comparison against experimental data) similar to experimental solution NMR structures. We selected six representative small proteins with structures determined by both NMR and X-ray crystallography, and modeled each of them using AlphaFold. Using several structure validation tools integrated under the Protein Structure Validation Software suite (PSVS), we then assessed how well these models fit to experimental NMR data, including NOESY peak lists (RPF-DP scores), comparisons between predicted rigidity and chemical shift data (ANSURR scores), and 15N-1H residual dipolar coupling data (RDC Q scores) analyzed by software tools integrated in the PSVS suite. Remarkably, the fits to NMR data for the protein structure models predicted with AlphaFold are generally similar, or better, than for the corresponding experimental NMR or X-ray crystal structures. These results document the value of PSVS for model vs. data assessment of protein NMR structures, and the potential for using AlphaFold models for guiding analysis of experimental NMR data and more generally in structural biology.

show abstract

Applying and improving AlphaFold at CASP14

Cited by 289 publications

References 24 publications

Systematic identification of conditionally folded intrinsically disordered regions by AlphaFold2

Systematic identification of conditionally folded intrinsically disordered regions by AlphaFold2

Predicting direct physical interactions in multimeric proteins with deep learning

AlphaFold Models of Small Proteins Rival the Accuracy of Solution NMR Structures

Contact Info

Product

Resources

About