Retention time prediction, facilitated by advances in machine learning, has become a useful tool in untargeted LC-MS applications. State-of-the-art approaches include graph neural networks and 1D-convolutional neural networks that are trained on the METLIN small molecule retention time dataset (SMRT). These approaches demonstrate accurate predictions comparable with the experimental error for the training set. The weak point of retention time prediction approaches is the transfer of predictions to various systems. The accuracy of this step depends both on the method of mapping and on the accuracy of the general model trained on SMRT. Therefore, improvements to both parts of prediction workflows may lead to improved compound annotations. Here, we evaluate capabilities of message-passing neural networks (MPNN) that have demonstrated outstanding performance on many chemical tasks to accurately predict retention times. The model was initially trained on SMRT, providing mean and median absolute cross-validation errors of 32 and 16 s, respectively. The pretrained MPNN was further fine-tuned on five publicly available small reversed-phase retention sets in a transfer learning mode and demonstrated up to 30% improvement of prediction accuracy for these sets compared with the state-of-the-art methods. We demonstrated that filtering isomeric candidates by predicted retention with the thresholds obtained from ROC curves eliminates up to 50% of false identities.
LC–MS is a key technique for the identification of small molecules in complex samples. Accurate mass, retention time, and fragmentation spectra from LC–MS experiments are compared to reference values for pure chemical standards. However, this information is often unavailable or insufficient, leading to an assignment to a list of candidates instead of a single hit; therefore, additional features are desired to filter candidates. One such promising feature is the number of specific functional groups of a molecule that can be counted via derivatization or isotope-exchange techniques. Hydrogen/deuterium exchange (HDX) is the most widespread implementation of isotope exchange for mass spectrometry, while oxygen 16O/18O exchange is not applied as frequently as HDX. Nevertheless, it is known that some functional groups may be selectively exchanged in 18O enriched media. Here, we propose an implementation of 16O/18O isotope exchange to highlight various functional groups. We evaluated the possibility of using the number of exchanged oxygen atoms as a descriptor to filter database candidates in untargeted LC–MS-based workflows. It was shown that 16O/18O exchange provides 62% (median, n = 45) search space reduction for a panel of drug molecules. Additionally, it was demonstrated that studying the fragmentation spectra after 16O/18O can aid in eliminating false positives and, in some cases, help to annotate fragments formed with water traces in the collisional cell.
Dissociation induced by the accumulation of internal energy via collisions of ions with neutral molecules is one of the most important fragmentation techniques in mass spectrometry (MS), and the identification of small singly charged molecules is based mainly on the consideration of the fragmentation spectrum. Many research studies have been dedicated to the creation of databases of experimentally measured tandem mass spectrometry (MS/MS) spectra (such as MzCloud, Metlin, etc.) and developing software for predicting MS/MS fragments in silico from the molecular structure (such as MetFrag, CFM-ID, CSI:FingerID, etc.). However, the fragmentation mechanisms and pathways are still not fully understood. One of the limiting obstacles is that protomers (positive ions protonated at different sites) produce different fragmentation spectra, and these spectra overlap in the case of the presence of different protomers. Here, we are proposing to use a combination of two powerful approaches: computing fragmentation trees that carry information of all consecutive fragmentations and consideration of the MS/MS data of isotopically labeled compounds. We have created PyFragMS—a web tool consisting of a database of annotated MS/MS spectra of isotopically labeled molecules (after H/D and/or 16 O/ 18 O exchange) and a collection of instruments for computing fragmentation trees for an arbitrary molecule. Using PyFragMS, we investigated how the site of protonation influences the fragmentation pathway for small molecules. Also, PyFragMS offers capabilities for performing database search when MS/MS data of the isotopically labeled compounds are taken into account.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.