Proteogenomics to discover the full coding content of genomes: A computational perspective

Castellana, Natalie; Bafna, Vineet

doi:10.1016/j.jprot.2010.06.007

Cited by 143 publications

(146 citation statements)

References 96 publications

Supporting

Mentioning

144

Contrasting

Unclassified

Order By: Relevance

“…These considerations are more intricate in proteogenomic projects that aim at genome annotation and discovery of novel gene models from shotgun proteomic data (58,59). The nature of these projects entails the use of large sequence databases that account for all possible protein coding regions of a genome.…”

Section: Data Set and Database Size Matter-mentioning

confidence: 99%

“…Proteogenomic studies for various model organisms resorted to six frame translated genomic databases and expressed sequence tag (EST) 1 databases to achieve this goal (60 -64). The number of peptides in such databases is in the order of billions and further grows by two orders of magnitude if single amino acid mutations are considered, too (58). Several strategies have been pursued to faithfully compress these databases.…”

Section: Data Set and Database Size Matter-mentioning

confidence: 99%

See 1 more Smart Citation

Inference and Validation of Protein Identifications

Claassen

2012

Molecular & Cellular Proteomics

View full text Add to dashboard Cite

Discovery or shotgun proteomics has emerged as the most powerful technique to comprehensively map out a proteome. Reconstruction of protein identities from the raw mass spectrometric data constitutes a cornerstone of any shotgun proteomics workflow. The inherent uncertainty of mass spectrometric data and the complexity of a proteome render protein inference and the statistical validation of protein identifications a non-trivial task, still being a subject of ongoing research. This review aims to survey the different conceptual approaches to the different tasks of inferring and statistically validating protein identifications and to discuss their implications on the scope of proteome exploration. Molecular & Cellular

show abstract

Section: Data Set and Database Size Matter-mentioning

confidence: 99%

Section: Data Set and Database Size Matter-mentioning

confidence: 99%

Inference and Validation of Protein Identifications

Claassen

2012

Molecular & Cellular Proteomics

View full text Add to dashboard Cite

show abstract

“…Proteogenomics strategy stands out as an important experimental tool to identify the protein coding potential of sequenced or unsequenced genomes of an organism (Castellana et al, 2010;Krug et al, 2011). Proteogenomically identified peptide data can provide invaluable information for gene annotation, which is almost impossible or difficult to predict using nucleotide sequence information alone.…”

Section: Introductionmentioning

confidence: 99%

Brain Proteomics of Anopheles gambiae

Dwivedi

Muthusamy

Kumar

et al. 2014

OMICS: A Journal of Integrative Biology

View full text Add to dashboard Cite

Anopheles gambiae has a well-adapted system for host localization, feeding, and mating behavior, which are all governed by neuronal processes in the brain. However, there are no published reports characterizing the brain proteome to elucidate neuronal signaling mechanisms in the vector. To this end, a large-scale mapping of the brain proteome of An. gambiae was carried out using high resolution tandem mass spectrometry, revealing a repertoire of >1800 proteins, of which 15% could not be assigned any function. A large proportion of the identified proteins were predicted to be involved in diverse biological processes including metabolism, transport, protein synthesis, and olfaction. This study also led to the identification of 10 GPCR classes of proteins, which could govern sensory pathways in mosquitoes. Proteins involved in metabolic and neural processes, chromatin modeling, and synaptic vesicle transport associated with neuronal transmission were predominantly expressed in the brain. Proteogenomic analysis expanded our findings with the identification of 15 novel genes and 71 cases of gene refinements, a subset of which were validated by RT-PCR and sequencing. Overall, our study offers valuable insights into the brain physiology of the vector that could possibly open avenues for intervention strategies for malaria in the future.

show abstract

“…Over the last few years, computational proteomics has become a dramatically growing field, and a handful of tools have been developed to execute complete proteogenomic analyses (11,46,47). Two excellent reviews have described a comprehensive overview of the various problems commonly encountered and their current solutions for this growing research area (8,11).…”

mentioning

confidence: 99%

GAPP: A Proteogenomic Software for Genome Annotation and Global Profiling of Post-translational Modifications in Prokaryotes

Zhang

Yang

Zeng

et al. 2016

Molecular & Cellular Proteomics

View full text Add to dashboard Cite

Although the number of sequenced prokaryotic genomes is growing rapidly, experimentally verified annotation of prokaryotic genome remains patchy and challenging. To facilitate genome annotation efforts for prokaryotes, we developed an open source software called GAPP for genome annotation and global profiling of post-translational modifications (PTMs) in prokaryotes. With a single command, it provides a standard workflow to validate and refine predicted genetic models and discover diverse PTM events. We demonstrated the utility of GAPP using proteomic data from Helicobacter pylori, one of the major human pathogens that is responsible for many gastric diseases. Our results confirmed 84.9% of the existing predicted H. pylori proteins, identified 20 novel protein coding genes, and corrected four existing gene models with regard to translation initiation sites. In particular, GAPP revealed a large repertoire of PTMs using the same proteomic data and provided a rich resource that can be used to examine the functions of reversible modifications in this human pathogen. This software is a powerful tool for genome annotation and global discovery of PTMs and is applicable to any sequenced prokaryotic organism; we expect that it will become an integral part of ongoing genome annotation efforts for prokaryotes. GAPP is freely available at https://sourceforge.net/ projects/gappproteogenomic/. Molecular & Cellular

show abstract

Proteogenomics to discover the full coding content of genomes: A computational perspective

Cited by 143 publications

References 96 publications

Inference and Validation of Protein Identifications

Inference and Validation of Protein Identifications

Brain Proteomics of Anopheles gambiae

GAPP: A Proteogenomic Software for Genome Annotation and Global Profiling of Post-translational Modifications in Prokaryotes

Contact Info

Product

Resources

About