2023
DOI: 10.1039/d2dd00107a
|View full text |Cite
|
Sign up to set email alerts
|

SolvBERT for solvation free energy and solubility prediction: a demonstration of an NLP model for predicting the properties of molecular complexes

Abstract: Deep learning models based on NLP, mainly the Transformer family, have been successfully applied to solve many chemistry-related problems, but their applications are mostly limited to chemical reactions. Meanwhile, solvation...

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
21
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5

Relationship

1
4

Authors

Journals

citations
Cited by 17 publications
(26 citation statements)
references
References 66 publications
2
21
0
Order By: Relevance
“…Since the BERT models have a self-contained unsupervised pre-training stage, 63 it is pre-trained by clustering the molecular structures which will not be affected by the distribution of property data. 8,42 In contrast, as the pre-training phase of the D-MPNN models is supervised, the significant gap between the property distribution of pre-training data and fine-tuning data may have a negative impact on the prediction accuracy for D-MPNN-based models, as we observed that the D-MPNN model without pre-training showed better performance in LUMO than the fully pre-trained PorphyDMPNN. On the other hand, a supervised pre-training may benefit more when the property data in the pre-training set has a similar distribution to the fine-tuning set.…”
Section: Performance Of Porphybert and Porphydmpnnmentioning
confidence: 69%
See 4 more Smart Citations
“…Since the BERT models have a self-contained unsupervised pre-training stage, 63 it is pre-trained by clustering the molecular structures which will not be affected by the distribution of property data. 8,42 In contrast, as the pre-training phase of the D-MPNN models is supervised, the significant gap between the property distribution of pre-training data and fine-tuning data may have a negative impact on the prediction accuracy for D-MPNN-based models, as we observed that the D-MPNN model without pre-training showed better performance in LUMO than the fully pre-trained PorphyDMPNN. On the other hand, a supervised pre-training may benefit more when the property data in the pre-training set has a similar distribution to the fine-tuning set.…”
Section: Performance Of Porphybert and Porphydmpnnmentioning
confidence: 69%
“…This benefit of shared pretraining was also observed in one of our previous studies for the multitask prediction of solubility and solvation free energy. 42 In addition, the unsupervised pre-training of PorphyBERT would benefit from future expansion of the pre-training database. Researchers can add more MpP structures to the pretraining database regardless of the design purpose of the MpP and the availability of property data, as more data typically enhance the ability of BERT-based models in clustering molecular structures.…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations