2010
DOI: 10.1186/1471-2105-11-167
|View full text |Cite
|
Sign up to set email alerts
|

Predicting protein-protein interactions in unbalanced data using the primary structure of proteins

Abstract: BackgroundElucidating protein-protein interactions (PPIs) is essential to constructing protein interaction networks and facilitating our understanding of the general principles of biological systems. Previous studies have revealed that interacting protein pairs can be predicted by their primary structure. Most of these approaches have achieved satisfactory performance on datasets comprising equal number of interacting and non-interacting protein pairs. However, this ratio is highly unbalanced in nature, and th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
59
0

Year Published

2012
2012
2018
2018

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 76 publications
(59 citation statements)
references
References 46 publications
0
59
0
Order By: Relevance
“…However, there are very limited techniques developed to confirm that two proteins do not interact. Recently, several studies have addressed this problem in evaluating computational methods of identifying protein interactions (Yu, Chou et al 2010;Yu, Guo et al 2010). This issue is still in a chaos stage and there is no perfect solution that fit everyone's requirements.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…However, there are very limited techniques developed to confirm that two proteins do not interact. Recently, several studies have addressed this problem in evaluating computational methods of identifying protein interactions (Yu, Chou et al 2010;Yu, Guo et al 2010). This issue is still in a chaos stage and there is no perfect solution that fit everyone's requirements.…”
Section: Discussionmentioning
confidence: 99%
“…Yu et al reported that using SVM to perform a complete interaction analysis on human genome may take years (Yu, Chou et al 2010). In this regard, efficient ML algorithms with acceptable accuracy are reasonable alternatives to SVM.…”
Section: Relaxed Variable Kernel Density Estimation (Rvkde)mentioning
confidence: 99%
See 1 more Smart Citation
“…Other approaches include SVMs that combine sequence profiles and other sequence-based information such as spatially neighbouring residues (Koike and Takagi 2004;Res et al 2005;Chen and Li 2010), a RF that integrates physicochemical properties of residues, evolutionary conservation and amino acid distances (Chen and Jeong 2009), and a naive Bayesian classifier trained to integrate position-specific scoring matrix and predicted accessibility (Murakami and Mizuguchi 2010). Finally, other sequence-based methods have been developed to improve prediction by tacking issues such as the problem of unbalanced data in protein sets (Yu et al 2010), i.e. the interface accounts for a small proportion of the exposed residues so the number of negative cases (non-interface residues) is much larger than the number of positive cases (interface residues) or improving the sampling (Engelen et al 2009) in evolutionary trace-based (Lichtarge et al 1996) methodologies.…”
Section: Sequence-based Prediction Methodsmentioning
confidence: 99%
“…Limitations of these machine learning methods are similar to the ones described above, notably including the lack of well-defined true negative examples. For instance, Yu et al evaluated the effect of positive-to-negative ratio in training and test sets for SVM based methods and found that it had considerable effect on classifier accuracy [83]. Lastly, use of sequence signatures to refine high-resolution PRM-mediated interaction networks must avoid duplicate counting of the domain-motif interaction knowledge already used to generate the original network.…”
Section: Sequence Signaturementioning
confidence: 99%