2021
DOI: 10.48550/arxiv.2106.10234
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Dual-view Molecule Pre-training

Jinhua Zhu,
Yingce Xia,
Tao Qin
et al.

Abstract: Inspired by its success in natural language processing and computer vision, pretraining has attracted substantial attention in cheminformatics and bioinformatics, especially for molecule based tasks. A molecule can be represented by either a graph (where atoms are connected by bonds) or a SMILES sequence (where depth-first-search is applied to the molecular graph with specific rules). Existing works on molecule pre-training use either graph representations only or SMILES representations only. In this work, we … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
15
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 7 publications
(16 citation statements)
references
References 38 publications
(82 reference statements)
0
15
0
Order By: Relevance
“…The model is pre-trained on approximately 10 million unique unlabeled molecules from following previous molecule SSL works. 39,40,49,53 Comparing to random split, scaffold split provides a more challenging yet more realistic setting to benchmark molecular property predictions. 28 During fine-tuning, the model is only trained on the train set and leverages the validation set to select the best-performing model.…”
Section: Datasetsmentioning
confidence: 99%
See 1 more Smart Citation
“…The model is pre-trained on approximately 10 million unique unlabeled molecules from following previous molecule SSL works. 39,40,49,53 Comparing to random split, scaffold split provides a more challenging yet more realistic setting to benchmark molecular property predictions. 28 During fine-tuning, the model is only trained on the train set and leverages the validation set to select the best-performing model.…”
Section: Datasetsmentioning
confidence: 99%
“…49,[51][52][53] Few works have investigated motif-level CL for molecules, which learns a table of frequently-occurring motif embeddings and trains a sampler to generate informative subgraphs for CL 50. Though such a method shows performance enhancement on various benchmarks, the learned sampler may not cover all the unique substructures in the large molecule dataset.…”
mentioning
confidence: 99%
“…In this way, Chemformer is trained to reconstruct the original SMILES from a corrupted input by randomly masking a part of characters in the SMILES. DMP [8] employs a two-branch model (Transformer and GNN) and involves self-supervised learning that masks atoms on a single molecule and is trained to reconstruct them. Experimental results show that self-supervised learning is an effective way to improve the performance of retrosynthesis tasks.…”
Section: Learning Representations For Chemical Synthesismentioning
confidence: 99%
“…Learning good deep representations for molecules has been vastly investigated in the literature. Since collecting labeled datasets is labor-intensive and costly, there is an ever-increasing trend in learning molecular representations in a self-supervised manner [3][4][5][6][7][8]73]. Inspired by the great success of Masked Auto-Encoder (MAE) on understanding of natural language [25,26] and image [81][82][83], there is a growing body of exploratory works on chemical understanding by masking characters in SMILES, or atoms and bonds on a molecular graph.…”
Section: Preliminary Workmentioning
confidence: 99%
See 1 more Smart Citation