2023
DOI: 10.1038/s41467-023-43214-1
|View full text |Cite
|
Sign up to set email alerts
|

A knowledge-guided pre-training framework for improving molecular representation learning

Han Li,
Ruotian Zhang,
Yaosen Min
et al.

Abstract: Learning effective molecular feature representation to facilitate molecular property prediction is of great significance for drug discovery. Recently, there has been a surge of interest in pre-training graph neural networks (GNNs) via self-supervised learning techniques to overcome the challenge of data scarcity in molecular property prediction. However, current self-supervised learning-based methods suffer from two main obstacles: the lack of a well-defined self-supervised learning strategy and the limited ca… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 12 publications
(9 citation statements)
references
References 64 publications
0
1
0
Order By: Relevance
“…Initializing TranSiGen with perturbational profiles generated by gene knockdown yields superior performance compared to random initialization. Additionally, using pre-training representation, Knowledge-guided Pre-training of Graph Transformer (KPGT) 21 , further enhances the performance of inferring DEGs, surpassing the molecular fingerprint ECFP4 (as detailed in the Molecular representations in Method and corroborated by the metric scores in Supplementary Tables 2 and 3 ).…”
Section: Resultsmentioning
confidence: 89%
See 1 more Smart Citation
“…Initializing TranSiGen with perturbational profiles generated by gene knockdown yields superior performance compared to random initialization. Additionally, using pre-training representation, Knowledge-guided Pre-training of Graph Transformer (KPGT) 21 , further enhances the performance of inferring DEGs, surpassing the molecular fingerprint ECFP4 (as detailed in the Molecular representations in Method and corroborated by the metric scores in Supplementary Tables 2 and 3 ).…”
Section: Resultsmentioning
confidence: 89%
“…Considering that the current number of compounds with experimentally measured gene expression profiles is still limited compared to the vast chemical space, TranSiGen utilized the pre-trained molecular representation KPGT 21 for compounds. KPGT is a novel self-supervised learning framework for molecular graph representation.…”
Section: Methodsmentioning
confidence: 99%
“…Injecting such domain knowledge into pretraining has not received much attention, although it is a straightforward strategy and domain-related property predictions are popularly taken as downstream tasks. KPGT attempted to utilize node-wise chemical properties in pretraining methods, resulting in significantly enhanced performance. Our pretraining technique follows in the footsteps of these efforts, demonstrating the potential of domain-knowledge-guided pretraining.…”
Section: Related Workmentioning
confidence: 99%
“…Graph contrastive coding (GCC) [28] designs a self-supervised graph neural network pre-training framework to capture common network topological properties across multiple networks. The KPGT [29] self-supervised framework introduces the line graph transformer (LiGhT), which is mainly used to accurately simulate the structural information of molecular graphs. However, it ignores the unique structural properties of chemical molecules, such as rings and functional groups.…”
Section: Introductionmentioning
confidence: 99%