“…These methods effectively reduce false positive predictions by globally considering all inter-residue correlations. More recently, methods like MetaPSICOV [19], SAE-DNN [20], DeepConPred [21], NeBcon [22] and RaptorX-Contact [23] integrated sophisticated machine-learning techniques to further enhance the prediction accuracy. In the latest CASP12 competition, RaptorX-Contact achieved the best performance in the category of template-free modeling targets.…”
Despite the rapid progress of protein residue contact prediction, predicted residue contact maps frequently contain many errors. However, information of residue pairing in β strands could be extracted from a noisy contact map, due to the presence of characteristic contact patterns in β -β interactions. This information may benefit the tertiary structure prediction of mainly β proteins. In this work, we introduce a novel ridge-detection-based β -β contact predictor, RDb 2 C, to identify residue pairing in β strands from any predicted residue contact map. The algorithm adopts ridge detection, a well-developed technique in computer image processing, to capture consecutive residue contacts, and then utilizes a novel multi-stage random forest framework to integrate the ridge information and additional features for prediction. Starting from the predicted contact map of CCMpred, RDb 2 C remarkably outperforms all state-of-the-art methods on two conventional test sets of β proteins (BetaSheet916 and BetaSheet1452), and achieves F1-scores of ~62% and ~76%at the residue level and strand level, respectively. Taking the prediction of the more advanced RaptorX-Contact as input, RDb 2 C achieves impressively higher performance, with F1-scores reaching ~76% and ~86% at the residue level and strand level, respectively. According to our tests on 61 mainly β proteins, improvement in the β -β contact prediction can further ameliorate the structural prediction.
Availability:All source data and codes are available at http://166.111.152.91/Downloads.html or at the GitHub address of https://github.com/wzmao/RDb2C.
Author summaryDue to the topological complexity, mainly β proteins are challenging targets in protein structure prediction. Knowledge of the pairing between β strands, especially the residue pairing pattern, can greatly facilitate the tertiary structure prediction of mainly β proteins. In this work, we developed a novel algorithm to identify the residue pairing in β strands from a predicted residue contact map.This method adopts the ridge detection technique to capture the characteristic pattern of β -β interactions from the map and then utilizes a multi-stage random forest framework to predict β -β contacts at the residue level. According to our tests, our method could effectively improve the prediction of β -β contacts even from a highly noisy contact map. Moreover, the refined β -β contact information could effectively improve the structural modeling of mainly β proteins.
“…These methods effectively reduce false positive predictions by globally considering all inter-residue correlations. More recently, methods like MetaPSICOV [19], SAE-DNN [20], DeepConPred [21], NeBcon [22] and RaptorX-Contact [23] integrated sophisticated machine-learning techniques to further enhance the prediction accuracy. In the latest CASP12 competition, RaptorX-Contact achieved the best performance in the category of template-free modeling targets.…”
Despite the rapid progress of protein residue contact prediction, predicted residue contact maps frequently contain many errors. However, information of residue pairing in β strands could be extracted from a noisy contact map, due to the presence of characteristic contact patterns in β -β interactions. This information may benefit the tertiary structure prediction of mainly β proteins. In this work, we introduce a novel ridge-detection-based β -β contact predictor, RDb 2 C, to identify residue pairing in β strands from any predicted residue contact map. The algorithm adopts ridge detection, a well-developed technique in computer image processing, to capture consecutive residue contacts, and then utilizes a novel multi-stage random forest framework to integrate the ridge information and additional features for prediction. Starting from the predicted contact map of CCMpred, RDb 2 C remarkably outperforms all state-of-the-art methods on two conventional test sets of β proteins (BetaSheet916 and BetaSheet1452), and achieves F1-scores of ~62% and ~76%at the residue level and strand level, respectively. Taking the prediction of the more advanced RaptorX-Contact as input, RDb 2 C achieves impressively higher performance, with F1-scores reaching ~76% and ~86% at the residue level and strand level, respectively. According to our tests on 61 mainly β proteins, improvement in the β -β contact prediction can further ameliorate the structural prediction.
Availability:All source data and codes are available at http://166.111.152.91/Downloads.html or at the GitHub address of https://github.com/wzmao/RDb2C.
Author summaryDue to the topological complexity, mainly β proteins are challenging targets in protein structure prediction. Knowledge of the pairing between β strands, especially the residue pairing pattern, can greatly facilitate the tertiary structure prediction of mainly β proteins. In this work, we developed a novel algorithm to identify the residue pairing in β strands from a predicted residue contact map.This method adopts the ridge detection technique to capture the characteristic pattern of β -β interactions from the map and then utilizes a multi-stage random forest framework to predict β -β contacts at the residue level. According to our tests, our method could effectively improve the prediction of β -β contacts even from a highly noisy contact map. Moreover, the refined β -β contact information could effectively improve the structural modeling of mainly β proteins.
“…Ovchinnikov predicted residue–residue interactions across protein interfaces using evolutionary information [ 6 ]. There are many other methods that are not described here [ 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 ].…”
Study of interface residue pairs is important for understanding the interactions between monomers inside a trimer protein–protein complex. We developed a two-layer support vector machine (SVM) ensemble-classifier that considers physicochemical and geometric properties of amino acids and the influence of surrounding amino acids. Different descriptors and different combinations may give different prediction results. We propose feature combination engineering based on correlation coefficients and F-values. The accuracy of our method is 65.38% in independent test set, indicating biological significance. Our predictions are consistent with the experimental results. It shows the effectiveness and reliability of our method to predict interface residue pairs of protein trimers.
“…Recently, deep learning has made breakthroughs in different scientific areas such as games [14,15], speech [16], face [17], image and text [18] recognition, robotics [19], and web search [20]. It has also been used in several bioinformatics applications including predicting protein binding sites in DNA and RNA [21], DNA replication initiation and termination zones [22], protein secondary structure [23] and folding [24], residue-residue and proteinprotein interaction [25], non-coding DNA function prediction [26] and inferring expression of target from landmark genes [27].…”
Understanding cell identity is an important task in many biomedical areas. Expression patterns of specific marker genes have been used to characterize some limited cell types, but exclusive markers are not available for many cell types. A second approach is to use machine learning to discriminate cell types based on the whole gene expression profiles (GEPs). The accuracies of simple classification algorithms such as linear discriminators or support vector machines are limited due to the complexity of biological systems. We used deep neural networks to analyze 1040 GEPs from 16 different human tissues and cell types. After comparing different architectures, we identified a specific structure of deep autoencoders that can encode a GEP into a vector of 30 numeric values, which we call the cell identity code (CIC). The original GEP can be reproduced from the CIC with an accuracy comparable to technical replicates of the same experiment. Although we use an unsupervised approach to train the autoencoder, we show different values of the CIC are connected to different biological aspects of the cell, such as different pathways or biological processes. This network can use CIC to reproduce the GEP of the cell types it has never seen during the training. It also can resist some noise in the measurement of the GEP. Furthermore, we introduce classifier autoencoder, an architecture that can accurately identify cell type based on the GEP or the CIC.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.