The identification of xenobiotics in nontargeted metabolomic analyses is a vital step in understanding human exposure. Xenobiotic metabolism, transformation, excretion, and coexistence with other endogenous molecules, however, greatly complicate the interpretation of features detected in nontargeted studies. While mass spectrometry (MS)-based platforms are commonly used in metabolomic measurements, deconvoluting endogenous metabolites from xenobiotics is also often challenged by the lack of xenobiotic parent and metabolite standards as well as the numerous isomers possible for each small molecule m/z feature. Here, we evaluate a xenobiotic structural annotation workflow using ion mobility spectrometry coupled with MS (IMS–MS), mass defect filtering, and machine learning to uncover potential xenobiotic classes and species in large metabolomic feature lists. Xenobiotic classes examined included those of known high toxicities, including per- and polyfluoroalkyl substances (PFAS), polycyclic aromatic hydrocarbons (PAHs), polychlorinated biphenyls (PCBs), polybrominated diphenyl ethers (PBDEs), and pesticides. Specifically, when the workflow was applied to identify PFAS in the NIST SRM 1957 and 909c human serum samples, it greatly reduced the hundreds of detected liquid chromatography (LC)–IMS–MS features by utilizing both mass defect filtering and m/z versus IMS collision cross sections relationships. These potential PFAS features were then compared to the EPA CompTox entries, and while some matched within specific m/z tolerances, there were still many unknowns illustrating the importance of nontargeted studies for detecting new molecules with known chemical characteristics. Additionally, this workflow can also be utilized to evaluate other xenobiotics and enable more confident annotations from nontargeted studies.
Metabolite annotation continues to be the widely accepted bottleneck in nontargeted metabolomics workflows. Annotation of metabolites typically relies on a combination of high-resolution mass spectrometry (MS) with parent and tandem measurements, isotope cluster evaluations, and Kendrick mass defect (KMD) analysis. Chromatographic retention time matching with standards is often used at the later stages of the process, which can also be followed by metabolite isolation and structure confirmation utilizing nuclear magnetic resonance (NMR) spectroscopy. The measurement of gas-phase collision cross-section (CCS) values by ion mobility (IM) spectrometry also adds an important dimension to this workflow by generating an additional molecular parameter that can be used for filtering unlikely structures. The millisecond timescale of IM spectrometry allows the rapid measurement of CCS values and allows easy pairing with existing MS workflows. Here, we report on a highly accurate machine learning algorithm (CCSP 2.0) in an open-source Jupyter Notebook format to predict CCS values based on linear support vector regression models. This tool allows customization of the training set to the needs of the user, enabling the production of models for new adducts or previously unexplored molecular classes. CCSP produces predictions with accuracy equal to or greater than existing machine learning approaches such as CCSbase, DeepCCS, and AllCCS, while being better aligned with FAIR (Findable, Accessible, Interoperable, and Reusable) data principles. Another unique aspect of CCSP 2.0 is its inclusion of a large library of 1613 molecular descriptors via the Mordred Python package, further encoding the fine aspects of isomeric molecular structures. CCS prediction accuracy was tested using CCS values in the McLean CCS Compendium with median relative errors of 1.25, 1.73, and 1.87% for the 170 [M − H] − , 155 [M + H] + , and 138 [M + Na] + adducts tested. For superclass-matched data sets, CCS predictions via CCSP allowed filtering of 36.1% of incorrect structures while retaining a total of 100% of the correct annotations using a Δ CCS threshold of 2.8% and a mass error of 10 ppm.
Metabolite annotation continues to be the widely accepted bottleneck in non-targeted metabolomics workflows. Annotation of metabolites typically relies on a combination of high resolution mass spectrometry (MS) with parent and tandem measurements, isotope cluster evaluations, and Kendrick mass defect (KMD) analysis. Chromatographic retention time matching with standards is often used at the later stages of the process, which can also be followed by metabolite isolation and structure confirmation utilizing nuclear magnetic resonance (NMR) spectroscopy. The measurement of gas phase collision cross section (CCS) values by ion mobility (IM) spectrometry also adds an important dimension to this workflow by generating an additional molecular parameter that can be used for filtering unlikely structures. The millisecond timescale of IM spectrometry allows the rapid measurement of CCS values and allows easy pairing with existing MS workflows. Here, we report on a highly accurate machine learning algorithm (CCSP 2.0) in an open-source Jupyter Notebook format to predict CCS values based on linear support vector regression models. This tool allows customization of the training set to the needs of the user, enabling the production of models for new adducts or previously unexplored molecular classes. CCSP produces predictions with accuracy equal to or greater than existing machine learning approaches such as CCSbase, DeepCCS and AllCCS, while being better aligned with FAIR (Findable, Accessible, Interoperable and Reusable) data principles. Another unique aspect of CCSP 2.0 its inclusion of a large library of 1613 molecular descriptors via the Mordred Python package, further encoding the fine aspects of isomeric molecular structures. CCS prediction accuracy was tested using CCS values in the McLean CCS Compendium with median relative errors of 1.25, 1.73 and 1.87% for the 170 [M-H]-, 155 [M+H]+ and 138 [M+Na]+ adducts tested. For class-matched data sets, CCS predictions via CCSP allowed filtering of 36.1% of incorrect structures while retaining a total of 100% of the correct annotations using a CCS threshold of 2.8% and a mass error of 10 ppm.
The identification of xenobiotics in nontargeted metabolomic analyses is a vital step in understanding human exposure. Xenobiotic metabolism, excretion, and co-existence with other endogenous molecules however greatly complicate nontargeted studies. While mass spectrometry (MS)-based platforms are commonly used in metabolomic measurements, deconvoluting endogenous metabolites and xenobiotics is often challenged by the lack of xenobiotic parent and metabolite standards as well as the numerous isomers possible for each small molecule m/z feature. Here, we evaluate the use of ion mobility spectrometry coupled with MS (IMS-MS) and mass defect filtering in a xenobiotic structural annotation workflow to reduce large metabolomic feature lists and uncover potential xenobiotic classes and species detected in the metabolomic studies. To evaluate the workflow, xenobiotics having known high toxicities including per- and polyfluoroalkyl substances (PFAS), polycyclic aromatic hydrocarbons (PAHs), polychlorinated biphenyls (PCBs) and polybrominated diphenyl ethers (PBDEs) were examined. Initially, to address the lack of available IMS collision cross section (CCS) values for per- and polyfluoroalkyl substances (PFAS), 88 PFAS standards were evaluated with IMS-MS to both develop a targeted PFAS CCS library and for use in machine learning predictions. The CCS values for biomolecules and xenobiotics were then plotted versus m/z, clearly distinguishing the biomolecules and halogenated xenobiotics. The xenobiotic structural annotation workflow was then used to annotate potential PFAS features in NIST human serum. The workflow reduced the 2,423 detected LC-IMS-MS features to 80 possible PFAS with 17 confidently identified through targeted analyses and 48 additional features correlating with possible CompTox entries.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.