Yuan-Bin She scite author profile

Deep learning models based on NLP, mainly the Transformer family, have been successfully applied to solve many chemistry-related problems, but their applications are mostly limited to chemical reactions. Meanwhile, solvation is an important concept in physical and organic chemistry, describing the interaction of solutes and solvents. This interaction leads to a solvation complex, a molecular complex similar to a reactant-reagent complex. In this study, we introduced the SolvBERT model, which reads the solute and solvents through the SMILES representation of the solvation complex. SolvBERT is pretrained in an unsupervised learning fashion using a large database of computational solvation free energies. The pretrained model can be used to predict the experimental solvation free energy or solubility, depending on the fine-tuning database. To the best of our knowledge, this multi-task prediction capability has not been observed in previously developed graph-based models for predicting the properties of molecular complexes. Furthermore, the performance of our SolvBERT in predicting solvation free energy is comparable to the state-of-the-art graph-based model DMPNN, mainly due to the clustering feature of the pretraining phase of the model, as demonstrated by the TMAP visualization algorithm.

show abstract

Exploring Deep Learning for Metalloporphyrins: Databases, Molecular Representations, and Model Architectures

Su

¹

,

Zhang

²

,

She

³

et al. 2022

View full text Add to dashboard Cite

Metalloporphyrins have been studied as biomimetic catalysts for more than 120 years and have accumulated a large amount of data, which provides a solid foundation for deep learning to discover chemical trends and structure–function relationships. In this study, key components of deep learning of metalloporphyrins, including databases, molecular representations, and model architectures, were systematically investigated. A protocol to construct canonical SMILES for metalloporphyrins was proposed, which was then used to represent the two-dimensional structures of over 10,000 metalloporphyrins in an existing computational database. Subsequently, several state-of-the-art chemical deep learning models, including graph neural network-based models and natural language processing-based models, were employed to predict the energy gaps of metalloporphyrins. Two models showed satisfactory predictive performance (R2 0.94) with canonical SMILES as the only source of structural information. In addition, an unsupervised visualization algorithm was used to interpret the molecular features learned by the deep learning models.

show abstract

Exploring Deep Learning for Metalloporphyrins: Databases, Molecular Representations, and Model Architectures

Su

¹

,

Zhang

²

,

She

³

et al. 2022

Preprint

View full text Add to dashboard Cite

Metalloporphyrins have been studied as biomimetic catalysts for more than 120 years and have accumulated a large amount of data, which provides a solid foundation for deep learning to discover chemical trends and structure-function relationships. In this study, key components of deep learning of metalloporphyrins, including databases, molecular representations, and model architectures, were systematically investigated. A protocol to construct canonical SMILES for metalloporphyrins was proposed, which was then used to represent the two-dimensional structures of over 10,000 metalloporphyrins in an existing computational database. Subsequently, several state-of-the-art chemical deep learning models, including graph neural network-based models and natural language processing-based models, were employed to predict the energy gaps of metalloporphyrins. Two models showed satisfactory predictive performance (R2>0.94) with canonical SMILES as the only source of structural information. In addition, an unsupervised visualization algorithm was used to interpret the molecular features learned by the deep learning models.

show abstract

SolvBERT for solvation free energy and solubility prediction: a demonstration of an NLP model for predicting the properties of molecular complexes

Yu

¹

,

Zhang

²

,

Cheng

³

et al. 2022

Preprint

View full text Add to dashboard Cite

Deep learning models based on NLP, mainly the Transformer family, have been successfully applied to solve many chemistry-related problems, but their applications are mostly limited to chemical reactions. Meanwhile, solvation is an important concept in physical and organic chemistry, describing the interaction of solutes and solvents. This interaction leads to a solvation complex, a molecular complex similar to a reactant-reagent complex. In this study, we introduced the SolvBERT model, which reads the solute and solvents through the SMILES representation of the solvation complex. SolvBERT is pretrained in an unsupervised learning fashion using a large database of computational solvation free energies. The pretrained model can be used to predict the experimental solvation free energy or solubility, depending on the fine-tuning database. To the best of our knowledge, this multi-task prediction capability has not been observed in previously developed graph-based models for predicting the properties of molecular complexes. Furthermore, the performance of our SolvBERT in predicting solvation free energy is comparable to the state-of-the-art graph-based model DMPNN, mainly due to the clustering feature of the pretraining phase of the model, as demonstrated by the TMAP visualization algorithm.

show abstract

Yuan-Bin She

SolvBERT for solvation free energy and solubility prediction: a demonstration of an NLP model for predicting the properties of molecular complexes

SolvBERT for solvation free energy and solubility prediction: a demonstration of an NLP model for predicting the properties of molecular complexes

Exploring Deep Learning for Metalloporphyrins: Databases, Molecular Representations, and Model Architectures

Exploring Deep Learning for Metalloporphyrins: Databases, Molecular Representations, and Model Architectures

SolvBERT for solvation free energy and solubility prediction: a demonstration of an NLP model for predicting the properties of molecular complexes

Contact Info

Product

Resources

About