Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks

Li, Yang; Zhang, Chengxin; Bell, Eric W.; Zheng, Wei; Zhou, Xiaogen; Yu, Dong‐Jun; Zhang, Yang

doi:10.1371/journal.pcbi.1008865

Cited by 67 publications

(62 citation statements)

References 47 publications

(86 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Although I-TASSER considerably refined the template quality by multiple fragment assembly simulations, the global fold was still incorrect; TM = 0.461 and root-mean-square deviation (RMSD) = 11.9Å. The six contact programs from C-I-TASSER (TripletRes, Li et al, 2021 ; ResTriplet, Li et al, 2019b ; ResPre, Li et al, 2019a ; ResPLM, Li et al, 2019b ; Zheng et al, 2019a ; and NeBconA and NeBconB, He et al, 2017 ) generated reasonable contact-map predictions, with a top L precision of 92.5%, 93.2%, 93.2%, 91.9%, 79.5%, and 85.1%, respectively, which resulted in an overall contact precision of 96.9% for the top L -ranked contacts after combining the maps. With the aid of this combined contact map, C-I-TASSER constructed a significantly improved model with TM = 0.746 and RMSD = 3.23Å.…”

Section: Resultsmentioning

confidence: 99%

“…It is noted that because the work was completed, the field has witnessed considerable progress in deep-learning-based interresidue distance and torsion angle predictions ( Xu, 2019 ; Yang et al, 2020 ), as well as the most recent end-to-end model training ( Jumper et al, 2020 ), which demonstrated significant usefulness for improving 3D structure modeling accuracy. Nevertheless, given the dominantly important role of contact predictions ( Shrestha et al, 2019 ) and the fact that the most reliable distance predictions are for short distances ( Li et al, 2021 ), we believe it is still of significant importance to examine separately the impact of contact maps on ab initio structure prediction, especially in conjunction with the most advanced structure folding simulations that can help explore the maximum potential of contact-map predictions. Our study showed that optimized coupling of deep-learning-based spatial information with efficient structure assembly simulations is the key to improving the capability of distantly homologous protein folding.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Folding non-homologous proteins by coupling deep-learning contact maps with I-TASSER assembly simulations

Zheng

Zhang

et al. 2021

Cell Reports Methods

Self Cite

338

231

View full text Add to dashboard Cite

SUMMARY Structure prediction for proteins lacking homologous templates in the Protein Data Bank (PDB) remains a significant unsolved problem. We developed a protocol, C-I-TASSER, to integrate interresidue contact maps from deep neural-network learning with the cutting-edge I-TASSER fragment assembly simulations. Large-scale benchmark tests showed that C-I-TASSER can fold more than twice the number of non-homologous proteins than the I-TASSER, which does not use contacts. When applied to a folding experiment on 8,266 unsolved Pfam families, C-I-TASSER successfully folded 4,162 domain families, including 504 folds that are not found in the PDB. Furthermore, it created correct folds for 85% of proteins in the SARS-CoV-2 genome, despite the quick mutation rate of the virus and sparse sequence profiles. The results demonstrated the critical importance of coupling whole-genome and metagenome-based evolutionary information with optimal structure assembly simulations for solving the problem of non-homologous protein structure prediction.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Folding non-homologous proteins by coupling deep-learning contact maps with I-TASSER assembly simulations

Zheng

Zhang

et al. 2021

Cell Reports Methods

Self Cite

338

231

View full text Add to dashboard Cite

show abstract

“…This method predicts CM with various distance thresholds of 6, 7.5, 8, 8.5, and 10 Å, and then refines them to leave with only 8 Å CM with an improved prediction rate [ 77 ]. TripletRes starts with the collection of MSAs through whole-genome and metagenome sequence databases and then constructs three complimentary co-evolutionary feature matrices (covariance matrix, precision matrix, and pseudolikelihood maximization) to create contact-map models through deep residual convolutional neural network training [ 78 ]. DeepContact is also a CNN-based approach that discovers co-evolutionary motifs and leverages these patterns to enable accurate inference of contact probabilities [ 79 ].…”

Section: Prediction Of 1d and 2d Protein Structural Annotationsmentioning

confidence: 99%

“…C-I-TASSER (contact-guided iterative threading assembly refinement) is an extended method from the original I-TASSER for high-accuracy protein structure and function predictions [ 102 ]. It generates inter-residue CMs using multiple deep neural-network predictors (such as NeBcon, ResPRE, and TripletRes) and identifies reliable structural templates from the PDB database by multiple threading approach (LOMETS) [ 78 , 103 , 104 , 105 ]. Then, the full-length atomic models are assembled by contact-map-guided replica-exchange Monte Carlo simulations.…”

Section: Prediction Of Protein 3d Structuresmentioning

confidence: 99%

Recent Applications of Deep Learning Methods on Evolution- and Contact-Based Protein Structure Prediction

Suh

Lee

Choi

et al. 2021

IJMS

View full text Add to dashboard Cite

The new advances in deep learning methods have influenced many aspects of scientific research, including the study of the protein system. The prediction of proteins’ 3D structural components is now heavily dependent on machine learning techniques that interpret how protein sequences and their homology govern the inter-residue contacts and structural organization. Especially, methods employing deep neural networks have had a significant impact on recent CASP13 and CASP14 competition. Here, we explore the recent applications of deep learning methods in the protein structure prediction area. We also look at the potential opportunities for deep learning methods to identify unknown protein structures and functions to be discovered and help guide drug–target interactions. Although significant problems still need to be addressed, we expect these techniques in the near future to play crucial roles in protein structural bioinformatics as well as in drug discovery.

show abstract

“…Two types of strategies have been widely considered for protein 3D structure prediction (2): the first is template-based modeling (TBM), which constructs structural models using solved structures as templates, where its success requests for the availability of homologous templates in the Protein Data Bank (PDB); the second is template-free modeling (FM) approach (or ab initio modeling), which dedicates to model the "Hard" proteins that do not have close homologous structures in the PDB. Due to the lack of reliable physics-based force fields, the most efficient FM methods, including Rosetta (3), QUARK (4), and I-TASSER (5), rely on a prior spatial restraints derived, usually through deep neural-network learning (6,7), from the co-evolution information based on multiple sequence alignments (MSA) of homologous proteins (8). Hence, to model 3D structure of the "Hard" proteins, a sufficient number of homologous sequences is critical to ensure the accuracy of deep machine-learning models and the quality of subsequent 3D structure constructions (9).…”

mentioning

confidence: 99%

Decoding microbiome and protein family linkage to improve protein structure prediction

Yang

Zheng

Ning

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Information extracted from microbiome sequences through deep-learning techniques can significantly improve protein structure and function modeling. However, the model training and metagenome search were largely blind with low efficiency. Built on 4.25 billion microbiome sequences from four major biomes (Gut, Lake, Soil and Fermentor), we proposed a MetaSource model to decode the inherent link of microbial niches with protein homologous families. Large-scale protein family folding experiments showed that a targeted approach using predicted biomes significantly outperform combined metagenome datasets in both speed of MSA collection and accuracy of deep-learning structure assembly. These results revealed the important link of biomes with protein families and provided a useful bluebook to guide future microbiome sequence database and modeling development for protein structure and function prediction.

show abstract

Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks

Cited by 67 publications

References 47 publications

Folding non-homologous proteins by coupling deep-learning contact maps with I-TASSER assembly simulations

Folding non-homologous proteins by coupling deep-learning contact maps with I-TASSER assembly simulations

Recent Applications of Deep Learning Methods on Evolution- and Contact-Based Protein Structure Prediction

Decoding microbiome and protein family linkage to improve protein structure prediction

Contact Info

Product

Resources

About