Reliability ranking and scaling improvements to the probability based matching system for unknown mass spectra

Atwater, Barbara L.; Stauffer, Douglas B.; McLafferty, Fred W.; Peterson, David W.

doi:10.1021/ac00281a028

Cited by 64 publications

(40 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Although there are several ways to define similarity between two peptide spectra (12,14,15,22), the normalized dot product or cosine 2 measure of spectral similarity is widely accepted to be robust and makes no special assumptions concerning peptide mass spectra (14). Moreover, as we show below and in the supplemental materials, cosine similarity has a number of useful mathematical properties that allow us to derive theoretical bounds to guide our approach.…”

Section: Mixture Spectrum Identification Problem (Msip)mentioning

confidence: 99%

Peptide Identification from Mixture Tandem Mass Spectra

Wang

Pérez-Santiago

Katz

et al. 2010

Molecular & Cellular Proteomics

View full text Add to dashboard Cite

The success of high-throughput proteomics hinges on the ability of computational methods to identify peptides from tandem mass spectra (MS/MS). However, a common limitation of most peptide identification approaches is the nearly ubiquitous assumption that each MS/MS spectrum is generated from a single peptide. We propose a new computational approach for the identification of mixture spectra generated from more than one peptide. Capitalizing on the growing availability of large libraries of singlepeptide spectra (spectral libraries), our quantitative approach is able to identify up to 98% of all mixture spectra from equally abundant peptides and automatically adjust to varying abundance ratios of up to 10:1. Furthermore, we show how theoretical bounds on spectral similarity avoid the need to compare each experimental spectrum against all possible combinations of candidate peptides (achieving speedups of over five orders of magnitude) and demonstrate that mixture-spectra can be identified in a matter of seconds against proteome-scale spectral libraries. Although our approach was developed for and is demonstrated on peptide spectra, we argue that the generality of the methods allows for their direct application to other types of spectral libraries and mixture spectra. Molecular & Cellular Proteomics 9: 1476 -1485, 2010. The success of tandem MS (MS/MS1 ) approaches to peptide identification is partly due to advances in computational techniques allowing for the reliable interpretation of MS/MS spectra. Mainstream computational techniques mainly fall into two categories: database search approaches that score each spectrum against peptides in a sequence database (1-4) or de novo techniques that directly reconstruct the peptide sequence from each spectrum (5-8). The combination of these methods with advances in high-throughput MS/MS have promoted the accelerated growth of spectral libraries, collections of peptide MS/MS spectra the identification of which were validated by accepted statistical methods (9, 10) and often also manually confirmed by mass spectrometry experts. The similar concept of spectral archives was also recently proposed to denote spectral libraries including "interesting" nonidentified spectra (11) (i.e. recurring spectra with good de novo reconstructions but no database match). The growing availability of these large collections of MS/MS spectra has reignited the development of alternative peptide identification approaches based on spectral matching (12-14) and alignment (15-17) algorithms.However, mainstream approaches were developed under the (often unstated) assumption that each MS/MS spectrum is generated from a single peptide. Although chromatographic procedures greatly contribute to making this a reasonable assumption, there are several situations where it is difficult or even impossible to separate pairs of peptides. Examples include certain permutations of the peptide sequence or posttranslational modifications (see (18) for examples of co-eluting histone modification variants). In addition,...

show abstract

Section: Mixture Spectrum Identification Problem (Msip)mentioning

confidence: 99%

Peptide Identification from Mixture Tandem Mass Spectra

Wang

Pérez-Santiago

Katz

et al. 2010

Molecular & Cellular Proteomics

View full text Add to dashboard Cite

show abstract

“…Compared with the rule-based prediction of mass spectrum, spectral library search is a simple, more universal method for all compounds in the library, so it is widely employed in GC-MS data-processing systems to identify an unknown spectrum. As a key technology, the current search algorithm contains dot-product [12], probability-based matching system [13], Euclidean distance [12], absolute value distance [12], wavelet and Fourier transform-based spectrum similarity [14], partial and semi-partial correlations [15]. Recently, many mass spectral libraries [e.g., US National Institute of Standards and Technology (NIST) library] were constructed for library search.…”

Section: Introductionmentioning

confidence: 99%

Strategies for structure elucidation of small molecules using gas chromatography-mass spectrometric data

Zhang

Tang

Cao

et al. 2013

TrAC Trends in Analytical Chemistry

View full text Add to dashboard Cite

“…Generally, the widely available identification method is mass spectral library searching [14] using various search algorithms including dot-product function [15] and probability based matching (PBM) [16][17][18]. In fact, these libraries serve not only mass spectral searching but also data mining and discovery of mass spectral characteristics for structural elucidation [19].…”

Section: Introductionmentioning

confidence: 99%