In-source fragmentation (ISF) is a naturally occurring phenomenon during electrospray ionization (ESI) in liquid chromatography−mass spectrometry (LC-MS) analysis. ISF leads to false metabolite annotation in untargeted metabolomics, prompting misinterpretation of the underlying biological mechanisms. Conventional metabolomic data cleaning mainly focuses on the annotation of adducts and isotopes, and the recognition of ISF features is mainly based on common neutral losses and the LC coelution pattern. In this work, we recognized three increasingly important patterns of ISF features, including (1) coeluting with their precursor ions, (2) being in the tandem MS (MS 2 ) spectra of their precursor ions, and (3) sharing similar MS 2 fragmentation patterns with their precursor ions. Based on these patterns, we developed an R package, ISFrag, to comprehensively recognize all possible ISF features from LC-MS data generated from full-scan, data-dependent acquisition, and data-independent acquisition modes without the assistance of common neutral loss information or MS 2 spectral library. Tested using metabolite standards, we achieved a 100% correct recognition of level 1 ISF features and over 80% correct recognition for level 2 ISF features. Further application of ISFrag on untargeted metabolomics data allows us to identify ISF features that can potentially cause false metabolite annotation at an omics-scale. With the help of ISFrag, we performed a systematic investigation of how ISF features are influenced by different MS parameters, including capillary voltage, end plate offset, ion energy, and "collision energy". Our results show that while increasing energies can increase the number of real metabolic features and ISF features, the percentage of ISF features might not necessarily increase. Finally, using ISFrag, we created an ISF pathway to visualize the relationships between multiple ISF features that belong to the same precursor ion. ISFrag is freely available on GitHub (https://github.com/HuanLab/ISFrag).
The nonlinear signal response of electrospray ionization (ESI) presents a critical limitation for mass spectrometry (MS)-based quantitative analysis. In the field of metabolomics research, this issue has largely remained unaddressed; MS signal intensities are usually directly used to calculate fold changes for quantitative comparison. In this work, we demonstrate that, due to the nonlinear ESI response, signal intensity ratios of a metabolic feature calculated between two samples may not reflect their real metabolic concentration ratios (i.e., fold-change compression), implying that conventional fold-change calculations directly using MS signal intensities can be misleading. In this regard, we developed a quality control (QC) sample-based signal calibration workflow to overcome the quantitative bias caused by the nonlinear ESI response. In this workflow, calibration curves for every metabolic feature are first established using a QC sample injected in serial injection volumes. The MS signals of each metabolic feature are then calibrated to their equivalent QC injection volumes for comparative analysis. We demonstrated this novel workflow in a targeted metabolite analysis, showing that the accuracy of fold-change calculations can be significantly improved. Furthermore, in a metabolomic comparison of the bone marrow interstitial fluid samples from leukemia patients before and after chemotherapy, an additional 59 significant metabolic features were found with fold changes larger than 1.5, and an additional 97 significant metabolic features had fold changes corrected by more than 0.1. This work enables high-quality quantitative analysis in untargeted metabolomics, thus providing more confident biological hypotheses generation.
Extracting metabolic features from liquid chromatography−mass spectrometry (LC-MS) data relies on the recognition of extracted ion chromatogram (EIC) peak shapes using peak picking algorithms. Unfortunately, all peak picking algorithms present a significant drawback of generating a problematic number of false positives. In this work, we take advantage of deep learning technology to develop a convolutional neural network (CNN)-based program that can automatically recognize metabolic features with poor EIC shapes, which are of low feature fidelity and more likely to be false. Our CNN model was trained using 25095 EIC plots collected from 22 LC-MS-based metabolomics projects of various sample types, LC and MS conditions. Notably, we manually inspected all the EIC plots to assign good or poor EIC quality for accurate model training. The trained CNN model is embedded into a C#-based program, named EVA (short for evaluation). The EVA Windows Application is a versatile platform that can process metabolic features generated by LC-MS systems of various vendors and processed using various data processing software. Our comprehensive evaluation of EVA indicates that it achieves over 90% classification accuracy. EVA can be readily used in LC-MS-based metabolomics projects and is freely available on the Microsoft Store by searching "EVA Metabolomics".
Tandem mass spectral (MS/MS) data in liquid chromatography–tandem mass spectrometry (LC-MS/MS) analysis are often contaminated as the selection of precursor ions is based on a low-resolution quadrupole mass filter. In this work, we developed a strategy to differentiate contamination fragment ions (CFIs) from true fragment ions (TFIs) in an MS/MS spectrum. The rationale is that TFIs should coelute with their parent ions, but CFIs should not. To assess coelution, we performed a parallel LC-MS/MS analysis in data-independent acquisition (DIA) with all-ion-fragmentation (AIF) mode. Using the DIA (AIF) data, peak–peak correlation (PPC) score is calculated between the extracted ion chromatogram (EIC) of the fragment ion using the MS/MS scans and the EIC of the precursor ion using the MS1 scans. A high PPC score is an indication of TFIs, and a low PPC score is an indication of CFIs. Tested using metabolomics data generated by high resolution QTOF and Orbitrap MS from various vendors in different LC-MS configurations, we found that more than 70% of the fragment ions have PPC scores < 0.8 and identified three common sources of CFIs, including (1) solvent contamination, (2) adjacent chemical contamination, and (3) undetermined signals from artifacts and noise. Combining PPC scores with other precursor and fragment ion information, we further developed a machine learning model that can robustly and conservatively predict CFIs. Incorporating the machine learning model, we created an R program, MS2Purifier, to automatically recognize CFIs and clean MS/MS spectra of metabolic features in LC-MS/MS data with high sensitivity and specificity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.