2020
DOI: 10.48550/arxiv.2012.11175
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learn molecular representations from large-scale unlabeled molecules for drug discovery

Abstract: How to produce expressive molecular representations is a fundamental challenge in AIdriven drug discovery. Graph neural network (GNN) has emerged as a powerful technique for modeling molecular data. However, previous supervised approaches usually suffer from the scarcity of labeled data and have poor generalization capability. Here, we proposed a novel Molecular Pre-training Graph-based deep learning framework, named MPG, that leans molecular representations from large-scale unlabeled molecules. In MPG, we pro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(12 citation statements)
references
References 53 publications
0
12
0
Order By: Relevance
“…Intuitively, molecules with the same scaffold share similar architectures, and therefore are expected to be close in the highlevel representation space. Following [32], we choose ten representative scaffolds (denoted as S) 6 and then randomly select 200k compounds. For each compound whose scaffold lies in S, we obtain its three representations: one from GNN pre-training with MLM only, one from Transformer pretraining with MLM only, and the third one from the Transformer branch of our DMP pre-training.…”
Section: Resultsmentioning
confidence: 99%
See 4 more Smart Citations
“…Intuitively, molecules with the same scaffold share similar architectures, and therefore are expected to be close in the highlevel representation space. Following [32], we choose ten representative scaffolds (denoted as S) 6 and then randomly select 200k compounds. For each compound whose scaffold lies in S, we obtain its three representations: one from GNN pre-training with MLM only, one from Transformer pretraining with MLM only, and the third one from the Transformer branch of our DMP pre-training.…”
Section: Resultsmentioning
confidence: 99%
“…GNN-based pre-training: Hu et al [23] used a GNN to encode the input molecule, and proposed two pre-training strategies, where we should either recover the masked attributes of the input (e.g., atom type), or use constrastive learning [17,2] to minimize the difference between two subgraphs of within a molecule. A similar idea also exists in Li et al [32]. Wang et al [53] applied contrastive learning across different molecules and proposed MolCLR, where a molecule should be similar to an augmented version of itself while dissimilar to others.…”
Section: Related Workmentioning
confidence: 89%
See 3 more Smart Citations