Recent advances and prospects of computational methods for metabolite identification: a review with emphasis on machine learning approaches

Nguyen, Dai Hai; Nguyen, Canh Hao; Mamitsuka, Hiroshi

doi:10.1093/bib/bby066

Cited by 70 publications

(73 citation statements)

References 53 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To address this problem, recently, the computational metabolomics community has grown to develop and improve computational approaches for known and unknown metabolite identification (Table 3). These computational metabolomic approaches employ two main strategies: (1) In silico prediction of fragmentation MS/MS spectra from chemical structures of known compounds, and (2) in silico prediction of molecular substructures (i.e., molecular fingerprints or feature vectors that encode the structure of a molecule) and general chemical properties of the unknowns from experimentally acquired MS/MS spectra [112]. With the in silico fragmentation methods, the experimentally acquired spectra of an unknown metabolite (for which reference spectra are not available) can be matched against in silico theoretically predicted spectra simulated on known candidate structures retrieved from databases (Human Metabolome Database (HMDB), PubChem, KEGG, etc.)…”

Section: Metabolite Identification: From Spectral Database Matching Tmentioning

confidence: 99%

“…To learn the mapping of an MS/MS spectrum to a molecule structure, these methods need to be trained on spectral databases of known metabolites. In general, machine learning methods can be divided in two groups, supervised learning for substructure prediction (e.g., CSI:FingerID) and unsupervised learning for substructure annotation and grouping of metabolites based on shared, biochemically relevant substructures (e.g., MS2LDA) [112,[114][115][116]. The main objective of supervised methods, such as CSI:FingerID integrated in Sirius tool, is to determine, using a database of molecular structures, the structure that best fits the experimental data.…”

Section: Metabolite Identification: From Spectral Database Matching Tmentioning

confidence: 99%

See 1 more Smart Citation

From Samples to Insights into Metabolism: Uncovering Biologically Relevant Information in LC-HRMS Metabolomics Data

Ivanišević

Want

2019

Metabolites

View full text Add to dashboard Cite

Untargeted metabolomics (including lipidomics) is a holistic approach to biomarker discovery and mechanistic insights into disease onset and progression, and response to intervention. Each step of the analytical and statistical pipeline is crucial for the generation of high-quality, robust data. Metabolite identification remains the bottleneck in these studies; therefore, confidence in the data produced is paramount in order to maximize the biological output. Here, we outline the key steps of the metabolomics workflow and provide details on important parameters and considerations. Studies should be designed carefully to ensure appropriate statistical power and adequate controls. Subsequent sample handling and preparation should avoid the introduction of bias, which can significantly affect downstream data interpretation. It is not possible to cover the entire metabolome with a single platform; therefore, the analytical platform should reflect the biological sample under investigation and the question(s) under consideration. The large, complex datasets produced need to be pre-processed in order to extract meaningful information. Finally, the most time-consuming steps are metabolite identification, as well as metabolic pathway and network analysis. Here we discuss some widely used tools and the pitfalls of each step of the workflow, with the ultimate aim of guiding the reader towards the most efficient pipeline for their metabolomics studies.

show abstract

Section: Metabolite Identification: From Spectral Database Matching Tmentioning

confidence: 99%

Section: Metabolite Identification: From Spectral Database Matching Tmentioning

confidence: 99%

From Samples to Insights into Metabolism: Uncovering Biologically Relevant Information in LC-HRMS Metabolomics Data

Ivanišević

Want

2019

Metabolites

View full text Add to dashboard Cite

show abstract

“…Structure elucidation from MS/MS data has always been a challenging and time-consuming task with a vast number of potentially interesting metabolites that are still unknowns. The main reason is that current MS/MS databases (spectral libraries) only contain a limited number of historical spectra, far below the number of metabolites in reality [3,4]. Advances in computational tools have led to a considerable extension of the search space that can be examined and have resulted in an improvement of the identification accuracy by using massive molecular databases (for example, PubChem currently contains about 100 million compounds [5]).…”

Section: Introductionmentioning

confidence: 99%

MESSAR: Automated recommendation of metabolite substructures from tandem mass spectra

et al. 2020

View full text Add to dashboard Cite

Despite the increasing importance of non-targeted metabolomics to answer various life science questions, extracting biochemically relevant information from metabolomics spectral data is still an incompletely solved problem. Most computational tools to identify tandem mass spectra focus on a limited set of molecules of interest. However, such tools are typically constrained by the availability of reference spectra or molecular databases, limiting their applicability of generating structural hypotheses for unknown metabolites. In contrast, recent advances in the field illustrate the possibility to expose the underlying biochemistry without relying on metabolite identification, in particular via substructure prediction. We describe an automated method for substructure recommendation motivated by association rule mining. Our framework captures potential relationships between spectral features and substructures learned from public spectral libraries. These associations are used to recommend substructures for any unknown mass spectrum. Our method does not require any predefined metabolite candidates, and therefore it can be used for the hypothesis generation or partial identification of unknown unknowns. The method is called MESSAR (MEtabolite Sub-Structure Auto-Recommender) and is implemented in a free online web service available at messar.biodatamining.be. OPEN ACCESS Citation: Liu Y, Mrzic A, Meysman P, De Vijlder T, Romijn EP, Valkenborg D, et al. (2020) MESSAR: Automated recommendation of metabolite substructures from tandem mass spectra. PLoS ONE 15(1): e0226770. https://doi.

show abstract

“…In recent years numerous powerful approaches (Nguyen et al, 2018a;Schymanski et al, 2017) for annotating MS 2 spectra with a predicted molecular structure have been developed (Ruttkies et al, 2016(Ruttkies et al, , 2019Dührkop et al, 2015;Brouard et al, 2016;Allen et al, 2014;Nguyen et al, 2018bNguyen et al, , 2019Dührkop et al, 2019). Typically, these methods output a ranked list of molecular structure candidates, that can be shown to human experts, or further post-processed, e.g.…”

Section: Introductionmentioning

confidence: 99%

Probabilistic Framework for Integration of Mass Spectrum and Retention Time Information in Small Molecule Identification

Bach

Rogers

Williamson

et al. 2020

Preprint

View full text Add to dashboard Cite

Motivation: Identification of small molecules in a biological sample remains a major bottleneck in molecular biology, despite a decade of rapid development of computational approaches for predicting molecular structures using mass spectrometry (MS) data. Recently, there has been increasing interest in utilizing other information sources, such as liquid chromatography (LC) retention time (RT), to improve the MS based identifications. Results: We put forward a probabilistic modelling framework to integrate MS and RT data of multiple features in an LC-MS experiment. We model the MS measurements and all pairwise retention order information as a Markov random field and use efficient approximate inference for scoring and ranking potential molecular structures. Our experiments show improved identification accuracy by combining tandem mass spectrometry data (MS2) and retention orders using our approach, thereby outperforming state-of-the-art methods. Furthermore, we demonstrate the benefit of our model when only a subset of LC-MS features have MS2 measurements available besides MS1.

show abstract

Recent advances and prospects of computational methods for metabolite identification: a review with emphasis on machine learning approaches

Cited by 70 publications

References 53 publications

From Samples to Insights into Metabolism: Uncovering Biologically Relevant Information in LC-HRMS Metabolomics Data

From Samples to Insights into Metabolism: Uncovering Biologically Relevant Information in LC-HRMS Metabolomics Data

MESSAR: Automated recommendation of metabolite substructures from tandem mass spectra

Probabilistic Framework for Integration of Mass Spectrum and Retention Time Information in Small Molecule Identification

Contact Info

Product

Resources

About