2017
DOI: 10.1038/s41467-017-01318-5
|View full text |Cite
|
Sign up to set email alerts
|

Significance estimation for large scale metabolomics annotations by spectral matching

Abstract: The annotation of small molecules in untargeted mass spectrometry relies on the matching of fragment spectra to reference library spectra. While various spectrum-spectrum match scores exist, the field lacks statistical methods for estimating the false discovery rates (FDR) of these annotations. We present empirical Bayes and target-decoy based methods to estimate the false discovery rate (FDR) for 70 public metabolomics data sets. We show that the spectral matching settings need to be adjusted for each project… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

2
167
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
2
2

Relationship

2
6

Authors

Journals

citations
Cited by 140 publications
(169 citation statements)
references
References 51 publications
2
167
0
Order By: Relevance
“…This approach paired with high acquisitions speed (>1 Hz) of state of the art instruments results in thousands of spectra per LC-MS/MS run. For a reliable data analysis and reproducible interpretation of the results, bioinformatic workflows including comprehensive databases and statistical significance estimation are crucial (da Silva et al, 2015;Böcker, 2017;Scheubert et al, 2017;Weber et al, 2017) and have been very recently employed for marine metabolomic studies Hartmann et al, 2017;Kujawinski et al, 2017;Longnecker and Kujawinski, 2017). With these new bioinformatic tools and instrumental improvements in sensitivity, acquisition speed and resolution we anticipate that the techniques used for DOM characterization will further shift toward non-targeted analyses using high-resolution LC-MS/MS that provide inventories of molecular structures in complex environmental datasets.…”
Section: Introductionmentioning
confidence: 99%
“…This approach paired with high acquisitions speed (>1 Hz) of state of the art instruments results in thousands of spectra per LC-MS/MS run. For a reliable data analysis and reproducible interpretation of the results, bioinformatic workflows including comprehensive databases and statistical significance estimation are crucial (da Silva et al, 2015;Böcker, 2017;Scheubert et al, 2017;Weber et al, 2017) and have been very recently employed for marine metabolomic studies Hartmann et al, 2017;Kujawinski et al, 2017;Longnecker and Kujawinski, 2017). With these new bioinformatic tools and instrumental improvements in sensitivity, acquisition speed and resolution we anticipate that the techniques used for DOM characterization will further shift toward non-targeted analyses using high-resolution LC-MS/MS that provide inventories of molecular structures in complex environmental datasets.…”
Section: Introductionmentioning
confidence: 99%
“…It is also worth noting that the estimation of false discovery rates, has been a historical goal 17 and recent focus for small molecule identification. 18,19 However, these tools are not presently in mainstream use and do not exist in the software used for this study. The mzCloud library does possess a sophisticated scoring mechanism for quality of MS/MS spectra as shown in Figure 1B for the amino acid asparagine, but the other libraries rely on MS1 mass accuracy alone.…”
Section: Resultsmentioning
confidence: 99%
“…Interestingly, 290 the MESSAR-predicted substructures showed a striking similarity to expert knowledge, 291 ranging from simple (e.g ethyl phenol of Motif 21) to complex (e.g. indole substructure 292 of Motif 25,26,194) substructures. According to experts' knowledge, ground-truth 293 annotations of 26 motifs (out of 28) were identical or very similar to the substructure 294 predicted by at least one matched MESSAR rule.…”
mentioning
confidence: 91%
“…Training spectral libraries 75 MESSAR generates rules from target and decoy GNPS spectral libraries built by 76 Scheubert et al (Fig 1A). According to the data descriptions [26], the target library 77 consists of 4138 positive ion high-quality labeled spectra acquired on Q-TOF instru-78 ments. For each spectrum, Scheubert et al have computed a fragmentation tree that 79 annotates a subset of fragments with molecular formulas and removed non-annotated 80 peaks that usually represent isotopic peaks, chemical noise, .…”
mentioning
confidence: 99%
See 1 more Smart Citation