Deep learning models based on NLP, mainly the Transformer family, have been successfully applied to solve many chemistry-related problems, but their applications are mostly limited to chemical reactions. Meanwhile, solvation...
Deep learning models based on NLP, mainly the Transformer family, have been successfully applied to solve many chemistry-related problems, but their applications are mostly limited to chemical reactions. Meanwhile, solvation is an important concept in physical and organic chemistry, describing the interaction of solutes and solvents. This interaction leads to a solvation complex, a molecular complex similar to a reactant-reagent complex. In this study, we introduced the SolvBERT model, which reads the solute and solvents through the SMILES representation of the solvation complex. SolvBERT is pretrained in an unsupervised learning fashion using a large database of computational solvation free energies. The pretrained model can be used to predict the experimental solvation free energy or solubility, depending on the fine-tuning database. To the best of our knowledge, this multi-task prediction capability has not been observed in previously developed graph-based models for predicting the properties of molecular complexes. Furthermore, the performance of our SolvBERT in predicting solvation free energy is comparable to the state-of-the-art graph-based model DMPNN, mainly due to the clustering feature of the pretraining phase of the model, as demonstrated by the TMAP visualization algorithm.
Deep learning models based on NLP, mainly the Transformer family, have been successfully applied to solve many chemistry-related problems, but their applications are mostly limited to chemical reactions. Meanwhile, solvation is an important concept in physical and organic chemistry, describing the interaction of solutes and solvents. This interaction leads to a solvation complex, a molecular complex similar to a reactant-reagent complex. In this study, we introduced the SolvBERT model, which reads the solute and solvents through the SMILES representation of the solvation complex. SolvBERT is pretrained in an unsupervised learning fashion using a large database of computational solvation free energies. The pretrained model can be used to predict the experimental solvation free energy or solubility, depending on the fine-tuning database. To the best of our knowledge, this multi-task prediction capability has not been observed in previously developed graph-based models for predicting the properties of molecular complexes. Furthermore, the performance of our SolvBERT in predicting solvation free energy is comparable to the state-of-the-art graph-based model DMPNN, mainly due to the clustering feature of the pretraining phase of the model, as demonstrated by the TMAP visualization algorithm.
Deep learning models based on NLP, mainly the Transformer family, have been successfully applied to solve many chemistry-related problems, but their applications are mostly limited to chemical reactions. Meanwhile, solvation is an important concept in physical and organic chemistry, describing the interaction of solutes and solvents. This interaction leads to a solvation complex, a molecular complex similar to a reactant-reagent complex. In this study, we introduced the SolvBERT model, which reads the solute and solvents through the SMILES representation of the solvation complex. SolvBERT is pretrained in an unsupervised learning fashion using a large database of computational solvation free energies. The pretrained model can be used to predict the experimental solvation free energy or solubility, depending on the fine-tuning database. To the best of our knowledge, this multi-task prediction capability has not been observed in previously developed graph-based models for predicting the properties of molecular complexes. Furthermore, the performance of our SolvBERT in predicting solvation free energy is comparable to the state-of-the-art graph-based model DMPNN, mainly due to the clustering feature of the pretraining phase of the model, as demonstrated by the TMAP visualization algorithm.
This article describes a machine learning guided framework for screening the potential toxicity impact of amine chemistries used in the synthesis of hybrid organic–inorganic perovskites. Using a combination of a probabilistic molecular fingerprint technique that encodes bond connectivity (MinHash) coupled to non‐linear data dimensionality reduction methods (Uniform Manifold Approximation and Projection), we develop an “Amine Atlas.” We show how the Amine Atlas can be used to rapidly screen the relative toxicity levels of amine molecules used in the synthesis of 2D and 3D perovskites and help identify safer alternatives. Our work also serves as a framework for rapidly identifying molecular similarity guided, structure–function relationships for safer materials chemistries that also incorporate sustainability/toxicity concerns.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.