2019
DOI: 10.1038/s41467-019-13680-7
|View full text |Cite
|
Sign up to set email alerts
|

The METLIN small molecule dataset for machine learning-based retention time prediction

Abstract: Machine learning has been extensively applied in small molecule analysis to predict a wide range of molecular properties and processes including mass spectrometry fragmentation or chromatographic retention time. However, current approaches for retention time prediction lack sufficient accuracy due to limited available experimental data. Here we introduce the METLIN small molecule retention time (SMRT) dataset, an experimentally acquired reverse-phase chromatography retention time dataset covering up to 80,038 … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

2
170
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 135 publications
(172 citation statements)
references
References 48 publications
2
170
0
Order By: Relevance
“…Domingo‐Almenara et al. [41] used this dataset for RT prediction using Deep Neural Networks (DNN). Validation showed that in 70% of the cases, the correct molecule was among the top three candidates.…”
Section: Retention Time Prediction In Lc–ms‐based Metabolomicsmentioning
confidence: 99%
See 1 more Smart Citation
“…Domingo‐Almenara et al. [41] used this dataset for RT prediction using Deep Neural Networks (DNN). Validation showed that in 70% of the cases, the correct molecule was among the top three candidates.…”
Section: Retention Time Prediction In Lc–ms‐based Metabolomicsmentioning
confidence: 99%
“…Also, most compounds analyzed in metabolomics are charged under the employed analytical conditions, which are of great importance particularly for HILIC‐based separations: molecular fingerprints such as (counting) fingerprints or the fingerprints used by Domingo‐Almenara et al. [41] may be valuable alternatives once sufficient training data becomes available.…”
Section: Current Limitationsmentioning
confidence: 99%
“…Retention time, that is, the time that a molecule takes to elute from the LC column, is readily available in all LC-MS pipelines, and is frequently used in aiding annotation (Stanstrup et al, 2015). A basic technique is to use the difference between the observed and predicted RT (Samaraweera et al, 2018;Domingo-Almenara et al, 2019) to prune the list of canididate molecular structures. A major challenge for utilizing RT information, however, is that the RT of the same molecule can vary significantly across different LC systems and configurations, necessitating system specific candidate RT reference databases and RT predictors.…”
Section: Introductionmentioning
confidence: 99%
“…For molecular physiochemical properties, and were used. It should be noted here that although multiple chemical properties could be included as molecular descriptors, previous studies haveshown that and are the most important properties for root-water partitioning12 .Another consideration of selecting physicochemical properties was that we tried to avoid those properties require expensive computation from molecule electronic structures, which may limit the model's applications in practice. For comparison, an empirical predictive LR model based on…”
mentioning
confidence: 99%
“…Machine learning and deep learning algorithms have been widely used in image recognition, natural language processing, and with chemistry applications including reaction prediction and molecular property prediction 11,12 . Recently, as big data-based assessment and decision-making tools, machine learning models were successfully applied to predict some characterization parameters of LCIA, such as chemical USEtox HC50 values [13][14][15] .…”
mentioning
confidence: 99%