2013
DOI: 10.1039/c3mb25451h
|View full text |Cite
|
Sign up to set email alerts
|

Identifying subcellular localizations of mammalian protein complexes based on graph theory with a random forest algorithm

Abstract: In the post-genome era, one of the most important and challenging tasks is to identify the subcellular localizations of protein complexes, and further elucidate their functions in human health with applications to understand disease mechanisms, diagnosis and therapy. Although various experimental approaches have been developed and employed to identify the subcellular localizations of protein complexes, the laboratory technologies fall far behind the rapid accumulation of protein complexes. Therefore, it is hig… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2013
2013
2020
2020

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 30 publications
0
4
0
Order By: Relevance
“…Some previous work found that most of the nuclear proteins were significantly enriched in protein complexes, and the protein complexes are more likely to have a high clustering coefficient. 74 Based on the above results, we expect that the nuclear proteins would have higher clustering coefficient. As shown in Table 2, the average clustering coefficient of the nuclear proteins is indeed higher than those of other seven categories, and the difference between them is significant (P-value o 2.20 Â 10 À16 ; KW test).…”
Section: Analysis Of Topological Propertiesmentioning
confidence: 88%
“…Some previous work found that most of the nuclear proteins were significantly enriched in protein complexes, and the protein complexes are more likely to have a high clustering coefficient. 74 Based on the above results, we expect that the nuclear proteins would have higher clustering coefficient. As shown in Table 2, the average clustering coefficient of the nuclear proteins is indeed higher than those of other seven categories, and the difference between them is significant (P-value o 2.20 Â 10 À16 ; KW test).…”
Section: Analysis Of Topological Propertiesmentioning
confidence: 88%
“…Because the machine learning method has the ability to train the model and prediction, it is widely used in proteomics. Current mainstream machine learning methods include random forest (RF) [14], naïve Bayes [15,16], decision tree (DT) [17], support vector machine (SVM) [18], extreme gradient boosting (XGBoost) [19], adaptive boosting (AdaBoost) [20], logistic regression (LR) [21], gradient boosting decision tree (GBDT) [22], etc.…”
Section: Introductionmentioning
confidence: 99%
“…At present, the machine learning algorithms for multisite protein subcellular localization include multi-label k-nearest neighbor algorithm (ML-kNN) [16], multi-label support vector machine algorithm (Rank-SVM) [17] and random forest algorithm (RF) [18]. Compared with these algorithms, the proposed algorithm has the following four characteristics.…”
Section: Introductionmentioning
confidence: 99%