Shotgun Protein Sequencing with Meta-contig Assembly

Guthals, Adrian; Clauser, Karl R.; Bandeira, Nuno

doi:10.1074/mcp.m111.015768

Cited by 26 publications

(59 citation statements)

References 44 publications

(31 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These methods may well be effective for simple spectra, derived from short peptides, yet they are still inadequate for the analysis of spectra derived from larger peptides and proteins. This is not to say that de-novo sequencing has no future in Top-Down proteomics; though still in development, multiple publications have presented some promise for Top-Down proteomics, either through a combination of Bottom-Up and Top-Down data[147,148] or through a constrained approach [149] that requires a sequence "anchor" from which the de-novo algorithm works.Despite its limitations Top-Down proteomics, either in its more "primitive form" or the modern one, provided an immensely important finding about venom biology: the understanding that venom production and secretion is not monolithic and uniform, but rather a complex mixture whose composition is susceptible to biochemical and behavioral modulation.Studies have hinted of this phenomenon in scorpions [17, 103], snake [150] and cone snails [16, 123, 151]. It is still…”

mentioning

confidence: 99%

Ecological venomics: How genomics, transcriptomics and proteomics can shed new light on the ecology and evolution of venom

Sunagar

Morgenstern

Reitzel

et al. 2016

Journal of Proteomics

View full text Add to dashboard Cite

mentioning

confidence: 99%

Ecological venomics: How genomics, transcriptomics and proteomics can shed new light on the ecology and evolution of venom

Sunagar

Morgenstern

Reitzel

et al. 2016

Journal of Proteomics

View full text Add to dashboard Cite

“…Remaining peaks not predicted to be y-ions were converted to charge one by a previously described MS/MS deconvolution tool (38). Deconvoluted DTA spectra that originated from identified MS/MS scans were then paired with the MSGF ϩ peptide IDs and passed to PepNovo ϩ for training.…”

Section: Lc-ms/ms-samplesmentioning

confidence: 99%

Neutron-encoded Signatures Enable Product Ion Annotation From Tandem Mass Spectra

Richards¹,

Colot²,

Guthals³

et al. 2013

Molecular & Cellular Proteomics

Self Cite

View full text Add to dashboard Cite

We report the use of neutron-encoded (NeuCode) stable isotope labeling of amino acids in cell culture for the purpose of C-terminal product ion annotation. Two NeuCode labeling isotopologues of lysine, 13 C 6 15 N 2 and 2 H 8 , which differ by 36 mDa, were metabolically embedded in a sample proteome, and the resultant labeled proteins were combined, digested, and analyzed via liquid chromatography and mass spectrometry. With MS/MS scan resolving powers of ϳ50,000 or higher, product ions containing the C terminus (i.e. lysine) appear as a doublet spaced by exactly 36 mDa, whereas N-terminal fragments exist as a single m/z peak. Through theory and experiment, we demonstrate that over 90% of all y-type product ions have detectable doublets. We report on an algorithm that can extract these neutron signatures with high sensitivity and specificity. In other words, of 15,503 y-type product ion peaks, the y-type ion identification algorithm correctly identified 14,552 (93.2%) based on detection of the NeuCode doublet; 6.8% were misclassified (i.e. other ion types that were assigned as y-type products). Searching NeuCode labeled yeast with PepNovo ؉ resulted in a 34% increase in correct de novo identifications relative to searching through MS/MS only. We use this tool to simplify spectra prior to database searching, to sort unmatched tandem mass spectra for spectral richness, for correlation of co-fragmented ions to their parent precursor, and for de novo sequence identification. The ability to make de novo sequence identifications directly from tandem mass spectra has long been a holy grail of the proteomic community. Such a capability would wean the field from its reliance upon sequenced genome databases. Even for organisms with fully annotated genomes, events such as single nucleotide polymorphisms, alternative splicing, gene fusion, and a host of other genomic transformations can result in altered proteomes. These alterations can vary from cell to cell and individual to individual. Thus, one could argue that the most valuable proteomic information, the individual and cellular proteome variation from the genome, remains elusive (1). This problem has received considerable attention; that said, it is not easy to de novo correlate spectrum to sequence in a large-scale, automated fashion (2-6). Improvements in mass accuracy have helped, but routine, reliable de novo sequencing without database assistance is not standard (7-10).A primary means to facilitate de novo spectral interpretation is the simple annotation of m/z peaks in tandem mass spectra as either N-or C-terminal. We and others have investigated this seemingly simple first step. Real-world spectra, however, are complex. Difficulties often arise in determining the charge state of the fragment or in differentiating between fragment ions and peaks arising from neutral loss, internal fragmentation, or spectral noise, both electronic and chemical. Several strategies have focused on product ion annotation. These approaches have included manipulation of the N-terminus ...

show abstract

“…Yet, recent advances in de novo sequencing have demonstrated 97-99% sequencing accuracy (percent of amino acids in matched peptides that are correct) at nearly the same level of coverage (percent of amino acids in target peptides that were matched) as that of database search for small mixtures of target proteins (Guthals et al, 2012a(Guthals et al, , 2013. At the heart of this approach is the pairing of spectra from overlapping peptides (i.e., peptides that have overlapping sequences) to construct spectral networks (Bandeira et al, 2004;Guthals et al, 2012b) where a node represents an individual spectrum [or a consensus spectrum from a clustered set of spectra from the same precursor (Frank et al, 2008)] and edges denote pairs of spectra from peptides with overlapping sequences.…”

Section: Introductionmentioning

confidence: 99%

“…At the heart of this approach is the pairing of spectra from overlapping peptides (i.e., peptides that have overlapping sequences) to construct spectral networks (Bandeira et al, 2004;Guthals et al, 2012b) where a node represents an individual spectrum [or a consensus spectrum from a clustered set of spectra from the same precursor (Frank et al, 2008)] and edges denote pairs of spectra from peptides with overlapping sequences. It is then shown that de novo sequences assembled by simultaneous interpretation of multiple spectra from overlapping peptides are much more accurate than individual per-spectrum interpretations (Guthals et al, 2012a(Guthals et al, , 2013. Use of multiple enzyme digestions and strong cation exchange (SCX) (Edelmann, 2011) fractionation is becoming more common in MS/MS protocols to generate broader coverage of protein sequences and yield wider distributions of overlapping peptides, but current statistical methods still ignore the peptide sequence overlaps and separately compute the significance of individual peptides matched to individual spectra (Swaney et al, 2010).…”

Section: Introductionmentioning

confidence: 99%

The Generating Function Approach for Peptide Identification in Spectral Networks

Guthals

Boucher

Bandeira

2014

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Tandem mass (MS/MS) spectrometry has become the method of choice for protein identification and has launched a quest for the identification of every translated protein and peptide. However, computational developments have lagged behind the pace of modern data acquisition protocols and have become a major bottleneck in proteomics analysis of complex samples. As it stands today, attempts to identify MS/MS spectra against large databases (e.g., the human microbiome or 6-frame translation of the human genome) face a search space that is 10-100 times larger than the human proteome, where it becomes increasingly challenging to separate between true and false peptide matches. As a result, the sensitivity of current state-of-the-art database search methods drops by nearly 38% to such low identification rates that almost 90% of all MS/MS spectra are left as unidentified. We address this problem by extending the generating function approach to rigorously compute the joint spectral probability of multiple spectra being matched to peptides with overlapping sequences, thus enabling the confident assignment of higher significance to overlapping peptide-spectrum matches (PSMs). We find that these joint spectral probabilities can be several orders of magnitude more significant than individual PSMs, even in the ideal case when perfect separation between signal and noise peaks could be achieved per individual MS/MS spectrum. After benchmarking this approach on a typical lysate MS/MS dataset, we show that the proposed intersecting spectral probabilities for spectra from overlapping peptides improve peptide identification by 30-62%.

show abstract

Shotgun Protein Sequencing with Meta-contig Assembly

Cited by 26 publications

References 44 publications

Ecological venomics: How genomics, transcriptomics and proteomics can shed new light on the ecology and evolution of venom

Ecological venomics: How genomics, transcriptomics and proteomics can shed new light on the ecology and evolution of venom

Neutron-encoded Signatures Enable Product Ion Annotation From Tandem Mass Spectra

The Generating Function Approach for Peptide Identification in Spectral Networks

Contact Info

Product

Resources

About