2017
DOI: 10.1101/153213
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

An integrative strategy to identify the entire protein coding potential of prokaryotic genomes by proteogenomics

Abstract: Accurate annotation of all protein-coding sequences (CDSs) is an essential prerequisite to fully exploit the rapidly growing repertoire of completely sequenced prokaryotic genomes.However, large discrepancies among the number of CDSs annotated by different resources, 5missed functional short open reading frames (sORFs), and overprediction of spurious ORFs represent serious limitations.Our strategy towards accurate and complete genome annotation consolidates CDSs from multiple reference annotation resources, ab… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
37
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 12 publications
(38 citation statements)
references
References 84 publications
(118 reference statements)
0
37
0
Order By: Relevance
“…Comparative genomics identified both shared core gene clusters and gene clusters unique to EGD-e and ScottA (right upper panel). Moreover, an ab initio gene prediction based on Prodigal and an advanced in silico (six-frame translation) annotation (see Materials and Methods) were integrated with the RefSeq annotation to obtain a minimally redundant iPtgxDB [43] for EGD-e and for ScottA. DDAbased proteomics data were searched against the publicly available iPtgxDBs (https://iptgxdb.expasy.org) to obtain proteogenomic evidence for novel open reading frames (ORFs), novel start sites, and expressed pseudogenes (right lower panel).…”
Section: Fig 1 Overview Of Our Next-gen Proteogenomics Workflow For mentioning
confidence: 99%
See 4 more Smart Citations
“…Comparative genomics identified both shared core gene clusters and gene clusters unique to EGD-e and ScottA (right upper panel). Moreover, an ab initio gene prediction based on Prodigal and an advanced in silico (six-frame translation) annotation (see Materials and Methods) were integrated with the RefSeq annotation to obtain a minimally redundant iPtgxDB [43] for EGD-e and for ScottA. DDAbased proteomics data were searched against the publicly available iPtgxDBs (https://iptgxdb.expasy.org) to obtain proteogenomic evidence for novel open reading frames (ORFs), novel start sites, and expressed pseudogenes (right lower panel).…”
Section: Fig 1 Overview Of Our Next-gen Proteogenomics Workflow For mentioning
confidence: 99%
“…Although a complete NCBI reference genome sequence existed for EGD-e, the NCBI reference genome sequence for ScottA was incomplete and consisted of five contigs [44]. Motivated by our recent finding of significant differences between an NCBI reference genome and the de novo assembly of the actual lab strain [43], and an earlier study on Pseudomonas aeruginosa PAO1 that had demonstrated substantial genomic fluidity between closely related strains [45], we sequenced and de novo assembled both genomes to create the best possible reference sequence for the two strains, an important aspect for the proteogenomics element. Next, a comparative genomics analysis was carried out to identify core and strain-specific genes, which, upon integration with protein abundance data, might provide clues to explain the different phenotypes of the strains.…”
Section: Fig 1 Overview Of Our Next-gen Proteogenomics Workflow For mentioning
confidence: 99%
See 3 more Smart Citations