2009
DOI: 10.1371/journal.pcbi.1000605
|View full text |Cite
|
Sign up to set email alerts
|

Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies

Abstract: Due to the rapid release of new data from genome sequencing projects, the majority of protein sequences in public databases have not been experimentally characterized; rather, sequences are annotated using computational analysis. The level of misannotation and the types of misannotation in large public databases are currently unknown and have not been analyzed in depth. We have investigated the misannotation levels for molecular function in four public protein sequence databases (UniProtKB/Swiss-Prot, GenBank … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

4
628
0
1

Year Published

2013
2013
2020
2020

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 619 publications
(633 citation statements)
references
References 68 publications
4
628
0
1
Order By: Relevance
“…Interpreting global transcription studies can be problematic because differentially expressed genes can often have annotation and functional descriptions that are not validated and/or are unreliable [49]. This greatly restricts the ability to accurately infer which signaling pathways are activated or manipulated during the infestation process.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Interpreting global transcription studies can be problematic because differentially expressed genes can often have annotation and functional descriptions that are not validated and/or are unreliable [49]. This greatly restricts the ability to accurately infer which signaling pathways are activated or manipulated during the infestation process.…”
Section: Discussionmentioning
confidence: 99%
“…A well-studied set of 37 protein families with extensive experimental data was selected as a test set and revealed a high degree of missannotation, up to 80% in the three databases [49]. The majority of error was associated with over-prediction of molecular function in the absence of appropriate evidence, or incorrect inference based on the presence of protein domains.…”
mentioning
confidence: 99%
See 1 more Smart Citation
“…This approach can produce erroneous results when key functional residues are mutated, or when the alignment doesn't span the whole length of the proteins-possibly indicating changes in domain architecture [ 14 ]. Iterative transfers of computationally generated functional assignments can lead to uncontrolled propagation of such errors; the average error rate of molecular function annotations is estimated to approach 0 % only in the manually curated UniProtKB/ SwissProt database, while it is substantially higher in un-reviewed resources [ 15 ].…”
Section: Annotation Transfers From Homologous Proteinsmentioning
confidence: 99%
“…However, the biological roles of most microbial genes remain unidentified, indicating a great need to investigate this uncharacterized genetic information. 3 , 4 …”
Section: Introductionmentioning
confidence: 99%