Expressed Peptide Tags:  An Additional Layer of Data for Genome Annotation

Savidor, Alon; Donahoo, Ryan S.; Hurtado‐Gonzales, Oscar P.; VerBerkmoes, Nathan C.; Shah, Manesh; Lamour, Kurt; McDonald, W. Hayes

doi:10.1021/pr060134x

Cited by 31 publications

(17 citation statements)

References 47 publications

(82 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Thirteen proteins with a life stage specific expression pattern were identified by 2D-gel electrophoresis, including CRN2 (31). For P. sojae and P. ramorum, a global proteomic approach was used to detect proteomic differences between life stages (32,33), and a recent large-scale phosphoproteome analysis revealed the phosphorylation status of thousands of proteins and provided novel information on life stage specific phosphorylation events in P. infestans (34). Despite their importance, proteomic studies on Phytophthora extracellular proteins are even more limited.…”

mentioning

confidence: 99%

Profiling the Secretome and Extracellular Proteome of the Potato Late Blight Pathogen Phytophthora infestans

Meijer

Mancuso

Espadas

et al. 2014

Molecular & Cellular Proteomics

View full text Add to dashboard Cite

Oomycetes are filamentous organisms that cause notorious diseases, several of which have a high economic impact. Well known is Phytophthora infestans, the causal agent of potato late blight. Previously, in silico analyses of the genome and transcriptome of P. infestans resulted in the annotation of a large number of genes encoding proteins with an N-terminal signal peptide. This set is collectively referred to as the secretome and comprises proteins involved in, for example, cell wall growth and modification, proteolytic processes, and the promotion of successful invasion of plant cells. So far, proteomic profiling in oomycetes was primarily focused on subcellular, intracellular or cell wall fractions; the extracellular proteome has not been studied systematically. Here we present the first comprehensive characterization of the in vivo secretome and extracellular proteome of P. infestans. We have used mass spectrometry to analyze P. infestans proteins present in seven different growth media with mycelial cultures and this resulted in the consistent identification of over two hundred proteins. Gene ontology classification pinpointed proteins involved in cell wall modifications, pathogenesis, defense responses, and proteolytic processes. Moreover, we found members of the RXLR and CRN effector families as well as several proteins lacking an obvious signal peptide. The latter were confirmed to be bona fide extracellular proteins and this suggests that, similar to other organisms, oomycetes exploit non-conventional secretion mechanisms to transfer certain proteins to the extracellular environment. Molecular & Cellular

show abstract

mentioning

confidence: 99%

Profiling the Secretome and Extracellular Proteome of the Potato Late Blight Pathogen Phytophthora infestans

Meijer

Mancuso

Espadas

et al. 2014

Molecular & Cellular Proteomics

View full text Add to dashboard Cite

show abstract

“…Proteogenomics (using proteomic information to annotate the genome) complements nucleotide-based annotation in that it unambiguously determines reading frame, translation start and stop sites, splice boundaries, and the validity of short ORFs. By combining DNA-based annotation with proteogenomics, an accurate and more complete protein-coding catalog can be obtained (6)(7)(8)(9)(10). With its clear potential for improving genome annotation, proteogenomics could be integrated with genome projects.…”

mentioning

confidence: 99%

Discovery and revision of Arabidopsis genes by proteogenomics

Castellana

Payne²,

Shen³

et al. 2008

Proc. Natl. Acad. Sci. U.S.A.

258

280

View full text Add to dashboard Cite

Gene annotation underpins genome science. Most often protein coding sequence is inferred from the genome based on transcript evidence and computational predictions. While generally correct, gene models suffer from errors in reading frame, exon border definition, and exon identification. To ascertain the error rate of Arabidopsis thaliana gene models, we isolated proteins from a sample of Arabidopsis tissues and determined the amino acid sequences of 144,079 distinct peptides by tandem mass spectrometry. The peptides corresponded to 1 or more of 3 different translations of the genome: a 6-frame translation, an exon splicegraph, and the currently annotated proteome. The majority of the peptides (126,055) resided in existing gene models (12,769 confirmed proteins), comprising 40% of annotated genes. Surprisingly, 18,024 novel peptides were found that do not correspond to annotated genes. Using the gene finding program AUGUSTUS and 5,426 novel peptides that occurred in clusters, we discovered 778 new protein-coding genes and refined the annotation of an additional 695 gene models. The remaining 13,449 novel peptides provide high quality annotation (>99% correct) for thousands of additional genes. Our observation that 18,024 of 144,079 peptides did not match current gene models suggests that 13% of the Arabidopsis proteome was incomplete due to approximately equal numbers of missing and incorrect gene models.annotation ͉ genomics ͉ proteomics A fundamental goal of genome projects is to generate a protein-coding catalog. Much of modern biological research depends on a complete and accurate proteome. Extensive proteomic catalogs have been developed through the integration of gene prediction algorithms, cDNA sequences, and comparative genomics (1, 2). As emerging research is incorporated into annotation pipelines and manual curation efforts, gene models continue to improve. High throughput gene annotation pipelines use a variety of information sources, and benefit most significantly when new data contains information that is orthogonal to the kinds currently available (3).Recent advances in chemistry and algorithms for peptide mass spectrometry have enabled the production of large proteomics datasets with broad coverage of the proteome (4-6). Proteogenomics (using proteomic information to annotate the genome) complements nucleotide-based annotation in that it unambiguously determines reading frame, translation start and stop sites, splice boundaries, and the validity of short ORFs. By combining DNA-based annotation with proteogenomics, an accurate and more complete protein-coding catalog can be obtained (6-10). With its clear potential for improving genome annotation, proteogenomics could be integrated with genome projects.A recent publication by Baerenfaller et al. (4) demonstrated the ability of extensive resampling to provide good coverage of the Arabidopsis proteome. From 1,354 LC runs the authors identified 86,456 distinct peptides covering 13,029 proteins. In addition to providing an organ specific proteome catal...

show abstract

“…However, while there are hundreds of studies on using ESTs for genome annotation, EPT studies are still in infancy (Savidor et al 2006). This is unfortunate since EPTs may provide some advantages over ESTs and are easy to generate.…”

mentioning

confidence: 99%

Comparative proteogenomics: Combining mass spectrometry and comparative genomics to analyze multiple genomes

Gupta¹,

Benhamida²,

Bhargava³

et al. 2008

Genome Res.

View full text Add to dashboard Cite

Recent proliferation of low-cost DNA sequencing techniques will soon lead to an explosive growth in the number of sequenced genomes and will turn manual annotations into a luxury. Mass spectrometry recently emerged as a valuable technique for proteogenomic annotations that improves on the state-of-the-art in predicting genes and other features. However, previous proteogenomic approaches were limited to a single genome and did not take advantage of analyzing mass spectrometry data from multiple genomes at once. We show that such a comparative proteogenomics approach (like comparative genomics) allows one to address the problems that remained beyond the reach of the traditional "single proteome" approach in mass spectrometry. In particular, we show how comparative proteogenomics addresses the notoriously difficult problem of "one-hit-wonders" in proteomics, improves on the existing gene prediction tools in genomics, and allows identification of rare post-translational modifications. We therefore argue that complementing DNA sequencing projects by comparative proteogenomics projects can be a viable approach to improve both genomic and proteomic annotations.

show abstract

Expressed Peptide Tags: An Additional Layer of Data for Genome Annotation

Cited by 31 publications

References 47 publications

Profiling the Secretome and Extracellular Proteome of the Potato Late Blight Pathogen Phytophthora infestans

Profiling the Secretome and Extracellular Proteome of the Potato Late Blight Pathogen Phytophthora infestans

Discovery and revision of Arabidopsis genes by proteogenomics

Comparative proteogenomics: Combining mass spectrometry and comparative genomics to analyze multiple genomes

Contact Info

Product

Resources

About