“…As evident from an excerpt in introduction section of this article, it is difficult to compare our S. cerevisiae model with that of Ben-Hur and Noble (2005). Therefore, we compared the performance of our S. cerevisiae model on validation set obtained from Pitre et al (2006) [40] ( Table 4). …”
Section: Comparison With Existing Methodsmentioning
The availability of an increased number of fully sequenced genomes demands functional interpretation of the genomic information. Despite high throughput experimental techniques and in silico methods of predicting protein-protein interaction (PPI); the interactome of most organisms is far from completion. Thus, predicting the interactome of an organism is one of the major challenges in the post-genomic era. This manuscript describes Support Vector Machine (SVM) based models that have been developed for discriminating interacting and non-interacting pairs of proteins from their amino acid sequence. We have developed SVM models using various types of sequence compositions e.g. amino acid, dipeptide, biochemical property, split amino acid and pseudo amino acid composition. We also developed SVM models using evolutionary information in the form of Position Specific Scoring Matrix (PSSM) composition. We achieved maximum Matthews's correlation coefficient (MCC) of 1.00, 0.52 and 0.74 for Escherichia coli, Saccharomyces cerevisiae, and Helicobacter pylori, using dipeptide based SVM model at default threshold. It was observed that the performance of a prediction model depends on the dataset used for training and testing. In case of E. coli MCC decreased from 1.0 to 0.67 when evaluated on a new dataset. In order to understand PPI in different cellular environment, we developed species-specific and general models. It was observed that species-specific models are more accurate than general models. We conclude that the primary amino acid sequence based descriptors could be used to differentiate interacting from non-interacting protein pairs. Some amino acids tend to be favored in interacting pairs than non-interacting ones. Finally, a web server has been developed for predicting protein-protein interactions.
“…As evident from an excerpt in introduction section of this article, it is difficult to compare our S. cerevisiae model with that of Ben-Hur and Noble (2005). Therefore, we compared the performance of our S. cerevisiae model on validation set obtained from Pitre et al (2006) [40] ( Table 4). …”
Section: Comparison With Existing Methodsmentioning
The availability of an increased number of fully sequenced genomes demands functional interpretation of the genomic information. Despite high throughput experimental techniques and in silico methods of predicting protein-protein interaction (PPI); the interactome of most organisms is far from completion. Thus, predicting the interactome of an organism is one of the major challenges in the post-genomic era. This manuscript describes Support Vector Machine (SVM) based models that have been developed for discriminating interacting and non-interacting pairs of proteins from their amino acid sequence. We have developed SVM models using various types of sequence compositions e.g. amino acid, dipeptide, biochemical property, split amino acid and pseudo amino acid composition. We also developed SVM models using evolutionary information in the form of Position Specific Scoring Matrix (PSSM) composition. We achieved maximum Matthews's correlation coefficient (MCC) of 1.00, 0.52 and 0.74 for Escherichia coli, Saccharomyces cerevisiae, and Helicobacter pylori, using dipeptide based SVM model at default threshold. It was observed that the performance of a prediction model depends on the dataset used for training and testing. In case of E. coli MCC decreased from 1.0 to 0.67 when evaluated on a new dataset. In order to understand PPI in different cellular environment, we developed species-specific and general models. It was observed that species-specific models are more accurate than general models. We conclude that the primary amino acid sequence based descriptors could be used to differentiate interacting from non-interacting protein pairs. Some amino acids tend to be favored in interacting pairs than non-interacting ones. Finally, a web server has been developed for predicting protein-protein interactions.
“…Generally, primary sequences [4], [6], [7], [8], molecular structures [9], [10], [11], [12], [13], [14], [15], biochemical properties [16], [17], [18], [19], [20], and hybrid information [21], [22], [23], [24], [25], [26], [27], [28] are used as the sources for the prediction of the interactions. Additionally, alpha shape models [25], [29] are applied to describe the surface of the protein-DNA structures and defined a conditional probability function [30], which showed a better performance than the distance-dependent method [31] in distinguishing the native structures from the docking decoy sets.…”
Interactions between biomolecules play an essential role in various biological processes. For predicting DNA-binding or protein-binding proteins, many machine-learning-based techniques have used various types of features to represent the interface of the complexes, but they only deal with the properties of a single atom in the interface and do not take into account the information of neighborhood atoms directly. This paper proposes a new feature representation method for biomolecular interfaces based on the theory of graph wavelet. The enhanced graph wavelet features (EGWF) provides an effective way to characterize interface feature through adding physicochemical features and exploiting a graph wavelet formulation. Particularly, graph wavelet condenses the information around the center atom, and thus enhances the discrimination of features of biomolecule binding proteins in the feature space. Experiment results show that EGWF performs effectively for predicting DNA-binding and protein-binding proteins in terms of Matthew's correlation coefficient (MCC) score and the area value under the receiver operating characteristic curve (AUC).
“…The fact we are facing is that high-throughput technologies have only generated a large number of protein sequences with no more experimental knowledge. In order to bridge the gap between known protein sequences and their interaction statuses in the biological network, several methods have been developed to predict PPIs directly from primary sequences (Bock and Gough, 2001;Guo et al, 2008;Martin et al, 2005;Nanni, 2005;Nanni and Lumini, 2006a;Pitre et al, 2006). The typical way for constructing a sequence-based PPI prediction model is composed of two major steps: (1) extracting protein sequential features represented by discrete vectors; and (2) training an efficient machine learning algorithm in the constructed feature space.…”
Section: Introductionmentioning
confidence: 99%
“…By exploiting protein interaction data and domain information, MSSC achieved relatively favorable results. Pitre et al also developed a motif-based method called PIPE (Pitre et al, 2006). When determining protein pairs whether they form interactions or not, PIPE searched for the co-occurrences of their subsequences in those protein pairs that have already been known to interact.…”
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citationsâcitations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.