2004
DOI: 10.1093/bioinformatics/btg447
|View full text |Cite
|
Sign up to set email alerts
|

Predicting subcellular localization of proteins using machine-learned classifiers

Abstract: We have constructed five machine-learning classifiers for predicting subcellular localization of proteins from animals, plants, fungi, Gram-negative bacteria and Gram-positive bacteria, which are 81% accurate for fungi and 92-94% accurate for the other four categories. These are the most accurate subcellular predictors across the widest set of organisms ever published. Our predictors are part of the Proteome Analyst web-service.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
143
0

Year Published

2005
2005
2013
2013

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 312 publications
(145 citation statements)
references
References 15 publications
1
143
0
Order By: Relevance
“…For example, TAIR is currently using the TargetP system (Emanuelsson et al, 2000) for annotating the complete subcellular proteome of Arabidopsis (ftp://ftp.arabidopsis.org/home/ tair/Proteins/Properties/TargetP_analysis.tair9). We compared not only TargetP but some other tools, such as LOCtree (Nair and Rost, 2005), PA-SUB (Lu et al, 2004), MultiLoc (Hö glund et al, 2006), WoLF PSORT (Horton et al, 2007), and Plant-PLoc (Chou and Shen, 2007b), all of which originally reported good accuracy. However, a number of previous researchers (Emanuelsson, 2002;Heazlewood et al, 2004Heazlewood et al, , 2005 found only 40% to 50% accuracy of the existing systems in their experimental data sets when testing the available tools for Arabidopsis annotation.…”
Section: Benchmarking On Independent Data Sets and Comparison With Otmentioning
confidence: 99%
See 2 more Smart Citations
“…For example, TAIR is currently using the TargetP system (Emanuelsson et al, 2000) for annotating the complete subcellular proteome of Arabidopsis (ftp://ftp.arabidopsis.org/home/ tair/Proteins/Properties/TargetP_analysis.tair9). We compared not only TargetP but some other tools, such as LOCtree (Nair and Rost, 2005), PA-SUB (Lu et al, 2004), MultiLoc (Hö glund et al, 2006), WoLF PSORT (Horton et al, 2007), and Plant-PLoc (Chou and Shen, 2007b), all of which originally reported good accuracy. However, a number of previous researchers (Emanuelsson, 2002;Heazlewood et al, 2004Heazlewood et al, , 2005 found only 40% to 50% accuracy of the existing systems in their experimental data sets when testing the available tools for Arabidopsis annotation.…”
Section: Benchmarking On Independent Data Sets and Comparison With Otmentioning
confidence: 99%
“…We compared the performance of AtSubP on two diverse Arabidopsisspecific independent data sets (I and II) with some of the widely used tools, such as TargetP (Emanuelsson et al, 2000), LOCtree (Nair and Rost, 2005), PA-SUB (Lu et al, 2004), MultiLoc (Höglund et al, 2006), WoLF PSORT (Horton et al, 2007), and Plant-PLoc (Chou and Shen, 2007b). Although technically, the comparison with other methods might not be fair, as each of these methods was developed with different sets of training data, our main emphasis was to demonstrate how these general tools performed for individual genome annotation (e.g.…”
Section: Comparison With Other Prediction Programsmentioning
confidence: 99%
See 1 more Smart Citation
“…Over the years, a number of homology-based predictors have been proposed. For example, Proteome Analyst [9] computes the feature vectors for classification by using the presence or absence of some tokens from certain fields of the homologous sequences in the SwissProt database. Recently, a predictor called PairProSVM was proposed by Mak et al [10], which applies profile alignment to detect weak similarity between protein sequences.…”
Section: Introductionmentioning
confidence: 99%
“…A number of systems for predicting protein localization from sequence have been described. 5,8,14,17,18 The limitation of these systems is that they can only assign new proteins to the location categories with which they have been trained. This means that proteins with previously unseen location patterns cannot be properly categorized.…”
Section: Introductionmentioning
confidence: 99%