The
predicted liquid chromatographic retention times (RTs) of small
molecules are not accurate enough for wide adoption in structural
identification. In this study, we used the graph neural network to
predict the retention time (GNN-RT) from structures of small molecules
directly without the requirement of molecular descriptors. The predicted
accuracy of GNN-RT was compared with random forests (RFs), Bayesian
ridge regression, convolutional neural network (CNN), and a deep-learning
regression model (DLM) on a METLIN small molecule retention time (SMRT)
dataset. GNN-RT achieved the highest predicting accuracy with a mean
relative error of 4.9% and a median relative error of 3.2%. Furthermore,
the SMRT-trained GNN-RT model can be transferred to the same type
of chromatographic systems easily. The predicted RT is valuable for
structural identification in complementary to tandem mass spectra
and can be used to assist in the identification of compounds. The
results indicate that GNN-RT is a promising method to predict the
RT for liquid chromatography and improve the accuracy of structural
identification for small molecules.
Electron ionization−mass spectrometry (EI-MS) hyphenated to gas chromatography (GC) is the workhorse for analyzing volatile compounds in complex samples. The spectral matching method can only identify compounds within the spectral database. In response, we present a deep-learning-based approach (DeepEI) for structure elucidation of an unknown compound with its EI-MS spectrum. DeepEI employs deep neural networks to predict molecular fingerprints from an EI-MS spectrum and searches the molecular structure database with the predicted fingerprints. We evaluated DeepEI with MassBank spectra, and the results indicate DeepEI is an effective identification method. In addition, DeepEI can work cooperatively with database spectral matching and NEIMS (fingerprint to spectrum method) to improve identification accuracy.
Tandem
mass spectrometry (MS/MS) is the workhorse for structural
annotation of metabolites, because it can provide abundance of structural
information. Currently, metabolite identification mainly relies on
querying experimental spectra against public or in-house spectral
databases. The identification is severely limited by the available
spectra in the databases. Although, the metabolome consists of a huge
number of different functional metabolites, the whole metabolome derives
from a limited number of initial metabolites via bioreactions. In
each bioreaction, the reactant and the product often change some substructures
but are still structurally related. These structurally related metabolites
often have related MS/MS spectra, which provide the possibility to
identify unknown metabolites through known ones. However, it is challenging
to explore the internal relationship between MS/MS spectra and structural
similarity. In this study, we present the deep-learning-based approach
for MS/MS-aided structural-similarity scoring (DeepMASS), which can
score the structural similarity of unknown metabolite against the
known one with MS/MS spectra and deep neural networks. We evaluated
DeepMASS with leave-one-out cross-validation on MS/MS spectra of 662
compounds in KEGG and an external test on the biomarkers from male
infertility study measured on Shimadzu LC-ESI-IT-TOF and Bruker Compact
LC-ESI-QTOF. Results show that the identification of unknown compound
is valid if its structure-related metabolite is available in the database.
It provides an effective approach to extend the identification range
of metabolites for existing MS/MS databases.
Distilling accurate quantitation information on metabolites from liquid chromatography coupled with mass spectrometry (LC-MS) data sets is crucial for further statistical analysis and biomarker identification. However, it is still challenging due to the complexity of biological systems. The concept of pure ion chromatograms (PICs) is an effective way of extracting meaningful ions, but few toolboxes provide a full processing workflow for LC-MS data sets based on PICs. In this study, an integrated framework, KPIC2, has been developed for metabolomics studies, which can detect pure ions accurately, align PICs across samples, group PICs to identify isotope and potential adducts, fill missing peaks and do multivariate pattern recognition. To evaluate its performance, MM48, metabolomics quantitation, and Soybean seeds data sets have been analyzed using KPIC2, XCMS, and MZmine2. KPIC2 can extract more true ions with fewer detecting features, have good quantification ability on a metabolomics quantitation data set, and achieve satisfactory classification on a soybean seeds data set through kernel-based OPLS-DA and random forest. It is implemented in R programming language, and the software, user guide, as well as example scripts and data sets are available as an open source package at https://github.com/hcji/KPIC2 .
Electron−ionization mass spectrometry (EI-MS) hyphenated gas chromatography (GC) is the workhorse to analyze volatile compounds in complex samples. The spectral matching method can only identify compounds within spectral database. In response, we present a deep-learning-based approach (DeepEI) for structure elucidation of unknown compound with its EI-MS spectrum. DeepEI employs deep neural networks to predict molecular fingerprint from EI-MS spectrum, and searches molecular structure database with the predicted fingerprints. In addition, a convolutional neural network was also trained to filter the structures in database and improve the identification performance. Our method shows improvement on the competing method NEIMS in identification accuracy on both NIST test dataset and MassBank dataset. Furthermore, DeepEI (spectrum to fingerprint) and NEIMS (fingerprint to spectrum) can be combined to improve identification accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.