Accurate prediction of protein contact maps by coupling residual two-dimensional bidirectional long short-term memory with convolutional neural networks

Hanson, Jack; Paliwal, Kuldip K.; Litfin, Thomas; Yang, Yuedong; Zhou, Yaoqi

doi:10.1093/bioinformatics/bty481

Cited by 178 publications

(235 citation statements)

References 41 publications

Supporting

Mentioning

234

Contrasting

Order By: Relevance

“…Since training deep learning network requires a large number of training samples, we employed the dataset curated in 2017, as used in our previous study (Hanson, et al, 2018). The dataset is consisted of 12450 nonredundant chains with resolution < 2.5Å, R-factor < 1.0, sequence length ≥ 30, and sequence identity ≤ 25% from the cullpdb website.…”

Section: Datasetsmentioning

confidence: 99%

To Improve Protein Sequence Profile Prediction through Image Captioning on Pairwise Residue Distance Map

Chen

Sun

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

Motivation: Protein design is the well-known inverse protein-folding problem. Current protein design has low success rate to design single sequence, leading to studies on predicting sequence profile. Protein sequence profile can be computationally predicted by energy-based method or fragment-based methods. By integrating these methods with neural networks, our previous method, SPIN2 has achieved a sequence recovery rate of 34%. However, SPIN2 employed only one dimensional (1D) structural properties that are not sufficient to represent 3D structures. To overcome the sparse data of protein structures in 3D space, we converted 3D structures to 2D maps of pairwise residue distances. By integrating both 1D and 2D structural features, we developed a new method (SPROF) to predict protein sequence profile based on an image captioning learning frame. To our best knowledge, this is the first method to employ 2D distance map for predicting protein properties. Results: Finally, we obtained the best performed model, SPROF that combined recurrent neural network, convolution neural network and attention mechanism. The method achieved 39.8% in sequence recovery of residues on the independent test set, representing a 5.2% improvement over SPIN2. We also found the sequence recovery increased with the number of their neighbored residues in 3D structural space, indicating that our method can effectively learn long range information from the 2D distance map. Thus, such network architecture using 2D distance map is expected to be useful for other 3D structure-based applications, such as binding site prediction, protein function prediction, and protein interaction prediction. In addition, the generated sequence profile will be helpful for improving existing protein design and fold recognition techniques. Availability: https://github.com/sysu-yanglab/SPROF Contact:

show abstract

Section: Datasetsmentioning

confidence: 99%

To Improve Protein Sequence Profile Prediction through Image Captioning on Pairwise Residue Distance Map

Chen

Sun

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…In CASP12 and previous CAMEO tests we have demonstrated that deep ResNet can greatly improve contact prediction 6,[8][9][10] and that even without time-consuming conformation sampling, contacts predicted by deep ResNet can result in correct folding of (even membrane) proteins without detectable homology in PDB 11 . Afterwards, the power of deep convolutional neural network has been further validated by other research groups who have reimplemented similar deep networks for contact prediction [12][13][14] . Although contact prediction itself is an important problem that needs further research, we have switched our focus from contact to distance prediction and accordingly distance-based protein structure modeling.…”

Section: Introductionmentioning

confidence: 99%

Analysis of distance-based protein structure prediction by deep learning in CASP13

Wang

2019

Preprint

View full text Add to dashboard Cite

This paper reports the CASP13 results of distance-based contact prediction, threading and folding methods implemented in three RaptorX servers, which are built upon the powerful deep convolutional residual neural network (ResNet) method initiated by us for contact prediction in CASP12. On the 32 CASP13 FM (free-modeling) targets with a median MSA (multiple sequence alignment) depth of 36, RaptorX yielded the best contact prediction among 46 groups and almost the best 3D structure modeling among all server groups without time-consuming conformation sampling. In particular, RaptorX achieved top L/5, L/2 and L long-range contact precision of 70%, 58% and 45%, respectively, and predicted correct folds (TMscore>0.5) for 18 of 32 targets. Although on average underperforming AlphaFold in 3D modeling, RaptorX predicted correct folds for all FM targets with >300 residues (T0950-D1, T0969-D1 and T1000-D2) and generated the best 3D models for T0950-D1 and T0969-D1 among all groups. This CASP13 test confirms our previous findings: (1) predicted distance is more useful than contacts for both template-based and free modeling; and (2) structure modeling may be improved by integrating alignment and coevolutionary information via deep learning. This paper will discuss progress we have made since CASP12, the strength and weakness of our methods, and why deep learning performed much better in CASP13.

show abstract

“…[86][87][88] Recurrent architectures have also been used in contact prediction. 48,89 More recently, a recurrent architecture has been used to model tertiary structure. 90 This latter method has the attractive property of being end-to-end differentiable, meaning that all parts of the process from taking in the input features to predicting 3D coordinates (via predicted torsion angles) can be simultaneously optimized during the NN training process.…”

Section: Discussionmentioning

confidence: 99%

Recent developments in deep learning applied to protein structure prediction

2019

View full text Add to dashboard Cite

Although many structural bioinformatics tools have been using neural network models for a long time, deep neural network (DNN) models have attracted considerable interest in recent years. Methods employing DNNs have had a significant impact in recent CASP experiments, notably in CASP12 and especially CASP13. In this article, we offer a brief introduction to some of the key principles and properties of DNN models and discuss why they are naturally suited to certain problems in structural bioinformatics. We also briefly discuss methodological improvements that have enabled these successes. Using the contact prediction task as an example, we also speculate why DNN models are able to produce reasonably accurate predictions even in the absence of many homologues for a given target sequence, a result that can at first glance appear surprising given the lack of input information. We end on some thoughts about how and why these types of models can be so effective, as well as a discussion on potential pitfalls.

show abstract

Accurate prediction of protein contact maps by coupling residual two-dimensional bidirectional long short-term memory with convolutional neural networks

Abstract: Supplementary data are available at Bioinformatics online.

Cited by 178 publications

References 41 publications

To Improve Protein Sequence Profile Prediction through Image Captioning on Pairwise Residue Distance Map

To Improve Protein Sequence Profile Prediction through Image Captioning on Pairwise Residue Distance Map

Analysis of distance-based protein structure prediction by deep learning in CASP13

Recent developments in deep learning applied to protein structure prediction

Contact Info

Product

Resources

About