Mass spectrometry (MS) is an important analytical technique for the detection and identification of small compounds. The main bottleneck in the interpretation of metabolite profiling or screening experiments is the identification of unknown compounds from tandem mass spectra. Spectral libraries for tandem MS, such as MassBank or NIST, contain reference spectra for many compounds, but their limited chemical coverage reduces the chance for a correct and reliable identification of unknown spectra outside the database domain. On the other hand, compound databases like PubChem or ChemSpider have a much larger coverage of the chemical space, but they cannot be queried with spectral information directly. Recently, computational mass spectrometry methods and in silico fragmentation prediction allow users to search such databases of chemical structures. We present a new strategy called MetFusion to combine identification results from several resources, in particular, from the in silico fragmenter MetFrag with the spectral library MassBank to improve compound identification. We evaluate the performance on a set of 1062 spectra and achieve an improved ranking of the correct compound from rank 28 using MetFrag alone, to rank 7 with MetFusion, even if the correct compound and similar compounds are absent from the spectral library. On the basis of the evaluation, we extrapolate the performance of MetFusion to the KEGG compound database.
In this paper, we describe data processing and metabolite identification approaches which lead to a rapid and semi-automated interpretation of metabolomics experiments. Data from metabolite fingerprinting using LC-ESI-Q-TOF/MS were processed with several open-source software packages, including XCMS and CAMERA to detect features and group features into compound spectra. Next, we describe the automatic scheduling of tandem mass spectrometry (MS) acquisitions to acquire a large number of MS/MS spectra, and the subsequent processing and computer-assisted annotation towards identification using the R packages MetShot, Rdisop, and the MetFusion application. We also implement a simple retention time prediction model using predicted lipophilicity logD, which predicts retention times within 42 s (6 min gradient) for most compounds in our setup. We putatively identified 44 common metabolites including several amino acids and phospholipids at metabolomics standards initiative (MSI) levels two and three and confirmed the majority of them by comparison with authentic standards at MSI level one. To aid both data integration within and data sharing between laboratories, we integrated data from two labs and mapped retention times between the chromatographic systems. Despite the different MS instrumentation and different chromatographic gradient programs, the mapped retention times agree within 26 s (20 min gradient) for 90% of the mapped features.
e second Critical Assessment of Small Molecule Identi cation (CASMI) contest took place in 2013. A joint team from the Swiss Federal Institute of Aquatic Science and Technology (Eawag) and Leibniz Institute of Plant Biochemistry (IPB) participated in CASMI 2013 with an automatic work ow-style entry. MOLGEN-MS/MS was used for Category 1, molecular formula calculation, restricted by the information given for each challenge. MetFrag and MetFusion were used for Category 2, structure identi cation, retrieving candidates from the compound databases KEGG, PubChem and ChemSpider and joining these lists pre-submission. e results from Category 1 were used to guide whether formula or exact mass searches were performed for Category 2. e Category 2 results were impressive considering the database size and automated regime used, although these could not compete with the manual approach of the contest winner.e Category 1 results were a ected by large m/z and ppm values in the challenge data, where strategies beyond pure enumeration from other participants were more successful. However, the combination used for the CASMI 2013 entries was extremely useful for developing decision-making criteria for automatic, high throughput general unknown (non-target) identi cation and for future contests.
The task in the critical assessment of small molecule identification (CASMI) contest category 2 was to determine the identification of (initially) unknown compounds for which high-resolution tandem mass spectra were published. We focused on computer-assisted methods that tried to correctly identify the compound automatically and entered the contest with MetFrag and MetFusion to score candidate structures retrieved from the PubChem structure database. MetFrag was combined with the metabolite-likeness score, which helped to improve the performance for the natural product challenges. We present the results, discuss the performance, and give details of how to interpret the MetFrag and MetFusion output.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.