HelixMO: Sample-Efficient Molecular Optimization in Scene-Sensitive Latent Space

Chen, Zhiyuan; Fang, Xiaomin; Zixu, Hua,; Huang, Yueyang; Wang, Fan; Wang, Hua

doi:10.48550/arxiv.2112.00905

Cited by 1 publication

(1 citation statement)

References 28 publications

(40 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To get a training set that is representative of the chemical space, more molecules were generated from the initial 46 seed molecules using the sequence variational auto-encoder (SeqVAE) method provided by the computational biology platform of PaddleHelix. 31 Those compounds having a considerable molecular weight (M w > 200), high synthesis difficulty (synthesis accessibility score, SA > 4), or low water solubility (lipid−water partition coefficient, logP > 2.5) were excluded from the initial data set. Finally, 342 molecules were added to the training data set.…”

Section: ■ Methodsmentioning

confidence: 99%

Interpretable Machine Learning Model for Predicting Interaction Energies between Dimethyl Sulfide and Potential Absorbing Solvents

Chuanlei

Chen

Guo

et al. 2023

Ind. Eng. Chem. Res.

View full text Add to dashboard Cite

Non-bonding intermolecular interactions largely dominate the selective dissolution of trace species into physical solvents and, therefore, are fundamentally important to solvent development for the capture of environment-undesired compounds or purification of chemicals. However, acquirement of the interaction energy requires costly quantum chemical computation and still encounters a practical challenge to build a chemically interpretable machine learning (ML) prediction model using documented molecular descriptors. Herein, we report an ML model for predicting the interaction energies (E int) between dimethyl sulfide and potential absorbing solvents. Applying the reduced density gradient and quantum theory of atoms in molecules analyses, the non-bonding intermolecular interactions of dimethyl sulfide with solvent compounds were elucidated through focusing on the molecular fragments containing the main center (MC) and secondary center (SC) rather than the whole molecule. The training data set was obtained using a molecular generation strategy, and 21 molecular descriptors were defined to describe the electronic states of the central atoms in each solvent molecule and its nearby hydrogen-bond donors. Model analysis reveals that E int is mainly determined by the charge state of the crucial fragments and the hydrogen-bond donors of the solvent molecule. The custom-defined descriptors not only improve the regression and prediction performance but also enable the interpretability of the ML model. Additionally, the absorption equilibrium measurements of solubilities of dimethyl sulfide in several commercial solvents verified the strong correlation between the dissolving affinity and solute–solvent intermolecular interaction energy. The present study provides an approach to building practical and interpretable intelligent algorithms to aid the development of sustainable chemicals or processes.

show abstract

Section: ■ Methodsmentioning

confidence: 99%