2011
DOI: 10.1074/mcp.m111.008490
|View full text |Cite
|
Sign up to set email alerts
|

Published and Perished? The Influence of the Searched Protein Database on the Long-Term Storage of Proteomics Data

Abstract: In proteomics, protein identifications are reported and stored using an unstable reference system: protein identifiers. These proprietary identifiers are created individually by every protein database and can change or may even be deleted over time.To estimate the effect of the searched protein sequence database on the long-term storage of proteomics data we analyzed the changes of reported protein identifiers from all public experiments in the Proteomics Identifications (PRIDE) database by November 2010. To m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
19
0
1

Year Published

2011
2011
2015
2015

Publication Types

Select...
6
1

Relationship

3
4

Authors

Journals

citations
Cited by 23 publications
(20 citation statements)
references
References 33 publications
0
19
0
1
Order By: Relevance
“…The effect that this has to the proteomics community has been examined in some detail in two recent papers (Griss et al 2011a;Griss et al 2011b) where it was also shown that UniProt now provides directly comparable alternatives, an observation that is also evident for human sequences from Figure 3, where both UniProt and Ensembl are seen to closely resemble IPI.…”
Section: From Acquired Data To Processed Resultsmentioning
confidence: 98%
“…The effect that this has to the proteomics community has been examined in some detail in two recent papers (Griss et al 2011a;Griss et al 2011b) where it was also shown that UniProt now provides directly comparable alternatives, an observation that is also evident for human sequences from Figure 3, where both UniProt and Ensembl are seen to closely resemble IPI.…”
Section: From Acquired Data To Processed Resultsmentioning
confidence: 98%
“…As a result, alternative e®orts such as UniProtKB or the now discontinued International Protein Index (IPI) lead to distinctively di®erent results in the detail (di®erences with up to a¯fth of all entries) although, satisfyingly, the databases are largely overlapping. 11,12 This is not surprising since all large protein sequence databases are strongly interconnected via the International Nucleotide Sequence Database Consortium (INSDC) providing the EMBL/GenBank/DDBJ nucleotide set used by di®erent protein annotation pipelines. For example, about 98% of the protein sequences provided by UniProtKB come from the translations of coding sequences submitted to INSDC.…”
Section: à9mentioning
confidence: 99%
“…There is not only the issue of identi¯er stability over time. 11,12 This is a serious technical hindrance; yet, it can in principle be overcome by software adaptations as long as the sequence itself does still exist in the database. Similarly, the di®erences among alternative proteome databases for the same species can be circumvented by just applying all these proteome instances to the problem of study in parallel.…”
Section: à9mentioning
confidence: 99%
“…This enabled the authors to look for the expression of specific splice isoforms from CNS‐related genes. Finally, in another example of PRIDE data reuse, UniProtKB was determined to be the most suitable reference database for long‐term proteomics data storage 46.…”
Section: Introductionmentioning
confidence: 99%
“…The gene and protein sequence databases that identification depends on are constantly evolving and improving 46. This means that reprocessing a proteomics dataset with an updated version of the gene or protein database can result in improved findings.…”
Section: Introductionmentioning
confidence: 99%