Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2022
DOI: 10.1145/3534678.3539368
|View full text |Cite
|
Sign up to set email alerts
|

Unified 2D and 3D Pre-Training of Molecular Representations

Abstract: Molecular representation learning has attracted much attention recently. A molecule can be viewed as a 2D graph with nodes/atoms connected by edges/bonds, and can also be represented by a 3D conformation with 3-dimensional coordinates of all atoms. We note that most previous work handles 2D and 3D information separately, while jointly leveraging these two sources may foster a more informative representation. In this work, we explore this appealing idea and propose a new representation learning method based on … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
35
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
3
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 23 publications
(35 citation statements)
references
References 27 publications
(55 reference statements)
0
35
0
Order By: Relevance
“…To capture the rich information in molecular graph motifs, GROVER (Rong et al, 2020a) and MGSSL (Zhang et al, 2021) propose to predict or generate the motifs. Considering that 3D geometric information plays a vital role in predicting molecular properties, several recent works Stärk et al, 2021;Fang et al, 2022a;Zhu et al, 2022) pre-train the GNN encoders on molecular datasets with 3D geometric information. We recommend readers refer to a recent survey (Xia et al, 2022f) for more relevant literature.…”
Section: Pre-training On Moleculesmentioning
confidence: 99%
“…To capture the rich information in molecular graph motifs, GROVER (Rong et al, 2020a) and MGSSL (Zhang et al, 2021) propose to predict or generate the motifs. Considering that 3D geometric information plays a vital role in predicting molecular properties, several recent works Stärk et al, 2021;Fang et al, 2022a;Zhu et al, 2022) pre-train the GNN encoders on molecular datasets with 3D geometric information. We recommend readers refer to a recent survey (Xia et al, 2022f) for more relevant literature.…”
Section: Pre-training On Moleculesmentioning
confidence: 99%
“…For example, Liu et al [27] and Stärk et al [27] used mutual information between 2D and 3D views for molecular pretraining, therefore the GNN is still able to produce implicit 3D information that can used to inform property predictions. Zhu et al [59] proposed a unified 2D and 3D pre-training which jointly leverages the 2D graph structure and the 3D geometry of the molecule. GEM [11] proposes a bond-angle graph and self-supervised tasks which use large-scale unlabelled molecules with coarse 3D spatial structures which can be calculated by cheminformatics tools such as RDKit.…”
Section: Pre-training On Molecular Graphmentioning
confidence: 99%
“…As shown in Figure 3, 3D PGT first splits the 3D conformer optimization into three generative tasks: bond length, bond angle and dihedral angel. By reconstructing these local descriptors that can fully describe the 3D conformer, the encoder could implicitly generate and encode 3D information in its latent vectors, which can better reflect certain molecular properties [59]. Considering the weight distribution problem of these three pre-training tasks, we design a pre-training surrogate metric to dynamically adjust the weights of each generative task, and conduct the pre-training of the model as a bi-level optimization problem: where 𝜆 𝑖 is the loss weight for pre-training task L 𝑖 .…”
Section: Automated Fusion Framework For Multiple 3d Pre-training Tasksmentioning
confidence: 99%
“…Traditional approaches based on molecular dynamics or Markov chain Monte Carlo are often computationally expensive, especially for large molecules [32]. The 3D-geometry-enhanced MPMs [132,134] show notable superiority in the downstream task of conformation generation because they can capture the entailed relations between 2D molecular graph and 3D conformation. Representative datasets for the evaluation of molecule generation include ZINC [42], ChEMBL [2] and QM9 [76].…”
Section: Molecular Generation (Mg)mentioning
confidence: 99%