2020
DOI: 10.1021/acs.jmedchem.0c00385
|View full text |Cite
|
Sign up to set email alerts
|

Learning Molecular Representations for Medicinal Chemistry

Abstract: The accurate modeling and prediction of small molecule properties and bioactivities depend on the critical choice of molecular representation. Decades of informatics-driven research have relied on expert-designed molecular descriptors to establish quantitative structure−activity and structure−property relationships for drug discovery. Now, advances in deep learning make it possible to efficiently and compactly learn molecular representations directly from data. In this review, we discuss how active research in… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
120
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 138 publications
(135 citation statements)
references
References 141 publications
1
120
0
Order By: Relevance
“…Here what we found is that the graph-based models can outperform the descriptor-based models on some lager or multi-task datasets such as the HIV, Tox21 and ToxCast datasets, which is well accord with the previous conclusions where DNN excel at larger amounts of data and multi-task learning [65,66]. However, to build such generalizable and robust deep models requires large-scale high-quality datasets and the datasets in the practical drug discovery campaigns routinely suffer from narrow chemical diversity and insignificant sample sizes [67]. On the ground, we believe that the descriptor-based models can be still widely used and give reliable predictions in the drug discovery campaigns.In conclusion, regardless of the statistical results on the same data folds used by Attentive FP or a more reliable 50 times independent runs, what we found is that the traditional descriptorbased models generally outperform the state-of-the-art graph-based models.…”
supporting
confidence: 86%
“…Here what we found is that the graph-based models can outperform the descriptor-based models on some lager or multi-task datasets such as the HIV, Tox21 and ToxCast datasets, which is well accord with the previous conclusions where DNN excel at larger amounts of data and multi-task learning [65,66]. However, to build such generalizable and robust deep models requires large-scale high-quality datasets and the datasets in the practical drug discovery campaigns routinely suffer from narrow chemical diversity and insignificant sample sizes [67]. On the ground, we believe that the descriptor-based models can be still widely used and give reliable predictions in the drug discovery campaigns.In conclusion, regardless of the statistical results on the same data folds used by Attentive FP or a more reliable 50 times independent runs, what we found is that the traditional descriptorbased models generally outperform the state-of-the-art graph-based models.…”
supporting
confidence: 86%
“…Here, we explain the molecular descriptors (i.e., target protein descriptors and compound descriptors) and compound fingerprints, and provide the highly used programs for generating them (i.e., sequence‐based tools and structure‐based tools) in the Supporting Information. Additionally, Chuang et al 24 comprehensively discussed how AI‐based methods (i.e., deep learning [DL]) could address limitations of molecular descriptors and fingerprints and thereby improve the predictive modeling of compound bioactivities.…”
Section: Ai/ml Applications In Drug Discoverymentioning
confidence: 99%
“…We expect that combining structured representations of biological information with drug information will improve prediction performance of cellular viability. By designing models with stronger relational inductive biases, we can investigate methods of integrating learned representations [1, 7]. Traditional machine learning methods gain performance by integrating multiple independent datasets and annotated biological pathway information [8].…”
Section: Related Workmentioning
confidence: 99%
“…We hypothesize that models with stronger relational inductive biases, defined by a conditional formulation and expressed by architectural assumptions, will outperform a naive modeling approach. Neural networks are well suited for conditional model formulation due to their architectural flexibility, proven success integrating diverse data types, and most importantly, ability to learn hierarchical feature representations [4, 7].…”
Section: Introductionmentioning
confidence: 99%