Improving the generative performance of chemical autoencoders through transfer learning

Iovanac, Nicolae C.; Savoie, Brett M.

doi:10.1088/2632-2153/abae75

Cited by 8 publications

(11 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In previous work, we have demonstrated that the use of chemical autoencoders for targeted structure searching is effective for properties within the GDB19 dataset 39 , namely internal energy, zero-point vibrational energy, and HOMO-LUMO gap. 40 To examine the generality of this approach, we investigated three models trained to individually predict VIP, EA, and DM, and sampled 100,000 structures in property ranges poorly represented in the training data. The targeted ranges for VIP, EA, and DM were 10.0 to 11.0 eV, -2.0 to -1.0 eV, and 0.0 to 1.0 Debye, respectively (Fig.…”

Section: Resultsmentioning

confidence: 99%

Actively Searching: Inverse Design of Novel Molecules with Simultaneously Optimized Properties

Iovanac¹,

MacKnight²,

Savoie

2021

Preprint

View full text Add to dashboard Cite

<p>Combining quantum chemistry characterizations with generative machine learning models has the potential to accelerate molecular searches in chemical space. In this paradigm, quantum chemistry acts as a relatively cost-effective oracle for evaluating the properties of particular molecules while generative models provide a means of sampling chemical space based on learned structure-function relationships. For practical applications, multiple potentially orthogonal properties must be optimized in tandem during a discovery workflow. This carries additional difficulties associated with specificity of the targets and the ability for the model to reconcile all properties simultaneously. Here we demonstrate an active learning approach to improve the performance of multi-target generative chemical models. We first demonstrate the effectiveness of a set of baseline models trained on single property prediction tasks in generating novel compounds with various property targets, including both interpolative and extrapolative generation scenarios. For property ranges where accurate targeting proves difficult, the novel compounds suggested by the model are characterized using quantum chemistry to obtain the true values, and these new molecules closest to expressing the desired properties are fed back into the generative model for additional training. This gradually improves the generative models’ understanding of unknown areas of chemical space and shifts the distribution of generated compounds towards the targeted values. We then demonstrate the effectiveness of this active learning approach in generating compounds with multiple chemical constraints, including vertical ionization potential, electron affinity, and dipole moment targets, and validate the results at the B97X-D3/def2-TZVP level. This method requires no modifications to extant generative approaches, but rather utilizes their inherent generative and predictive aspects for self-refinement, and can be applied to situations where any number of properties with varying degrees of correlation must be optimized simultaneously.</p>

show abstract

Section: Resultsmentioning

confidence: 99%

Actively Searching: Inverse Design of Novel Molecules with Simultaneously Optimized Properties

Iovanac¹,

MacKnight²,

Savoie

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…It has been shown capable of learning underlying relationships from a diverse set of molecular data by letting multiple data tasks and domains interact adaptively while generating the joint embeddings. Joint training is also a type of joint embedding that has been successfully applied to improve deep learning‐based molecule generation and enable transfer learning [25–29] . Joint training incorporates a property prediction task into a variational autoencoder (VAE) [30] and has been shown to organize points in the VAE latent space, making the latent space amenable to inverse molecular design and optimization [25,29] .…”

Section: Introductionmentioning

confidence: 99%

“…jointly trained a VAE using drug likeliness and synthetic accessibility, then performed Bayesian optimization in the resulting latent space to identify novel drug‐like molecules [25] . In addition to latent space organization, joint training provides a platform for knowledge transfer between abundant and scarce data tasks [27,29] . When more than one property is used to develop the model, this constitutes a multitask transfer learning approach.…”

Section: Introductionmentioning

confidence: 99%

“…It therefore has been reasoned that joint embedding yields more expressive models [31] in part because of the likely organization they induce in the latent representation [25,28] . This, in turn, offers improved properties of molecules obtained from such generative models [29] …”

Section: Introductionmentioning

confidence: 99%

“…However previous works on joint embedding have largely focused on properties (internal energy, HOMO‐LUMO gap, specific heat, etc …) within a single class of materials where knowledge transfer occurs from one group of molecules to another of the same class but for which certain properties are unavailable [25,29] . In contrast, our aim is for multi‐class transfer where one set of molecules simply does not possess certain properties.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Locally Optimizable Joint Embedding Framework to Design Nitrogen‐rich Molecules that are Similar but Improved

Balakrishnan

VanGessel

Boukouvalas

et al. 2021

Molecular Informatics

View full text Add to dashboard Cite

Deep learning has shown great potential for generating molecules with desired properties. But the cost and time required to obtain relevant property data have limited study to only a few classes of materials for which extensive data have already been collected. We develop a deep learning method that combines a generative model with a property prediction model to fuse small data of one class of molecules with larger data in another class. Common low‐level physicochemical properties are jointly embedded into a latent space that can be used to design molecules in the smaller class. The chemical space around the molecules in the training set is explored through local gradient ascent optimization. Based on nine molecules from the original training set, nine new molecules are found to have improved properties while remaining structurally similar to the training molecules thereby easing requirements for entirely new synthesis routes. Validation is performed using an equilibrium thermochemistry code to verify the molecules and target properties. A specific example targeting the Chapman‐Jouguet velocity and small data for nitrogen‐rich molecules is shown. Despite the relative lack of nitrogen‐rich molecule data, the results demonstrate that fusing and joint embedding with plentiful low nitrogen molecular data can produce higher generative performance than using the scarce data alone.

show abstract

Actively Searching: Inverse Design of Novel Molecules with Simultaneously Optimized Properties

2022

Self Cite

View full text Add to dashboard Cite

Combining quantum chemistry characterizations with generative machine learning models has the potential to accelerate molecular discovery. In this paradigm, quantum chemistry acts as a relatively cost-effective oracle for evaluating the properties of particular molecules, while generative models provide a means of sampling chemical space based on learned structure–function relationships. For practical applications, multiple potentially orthogonal properties must be optimized in tandem during a discovery workflow. This carries additional difficulties associated with the specificity of the targets and the ability for the model to reconcile all properties simultaneously. Here, we demonstrate an active learning approach to improve the performance of multi-target generative chemical models. We first demonstrate the effectiveness of a set of baseline models trained on single property prediction tasks in generating novel compounds (i.e., not present in the training data) with various property targets, including both interpolative and extrapolative generation scenarios. For property ranges where accurate targeting proves difficult, the novel compounds suggested by the model are characterized using quantum chemistry and the new molecules closest to expressing the desired properties are fed back into the generative model for additional training. This gradually improves the generative models’ understanding of targeted areas of chemical space and shifts the distribution of the generated compounds toward the targeted values. We then demonstrate the effectiveness of this active learning approach in generating compounds with multiple chemical constraints, including vertical ionization potential, electron affinity, and dipole moment targets, and validate the results at the ωB97X-D3/def2-TZVP level. This method requires no modifications to extant generative approaches, but rather utilizes their inherent generative and predictive aspects for self-refinement, and can be applied to situations where any number of properties with varying degrees of correlation must be optimized simultaneously.

show abstract

Improving the generative performance of chemical autoencoders through transfer learning

Cited by 8 publications

References 33 publications

Actively Searching: Inverse Design of Novel Molecules with Simultaneously Optimized Properties

Actively Searching: Inverse Design of Novel Molecules with Simultaneously Optimized Properties

Locally Optimizable Joint Embedding Framework to Design Nitrogen‐rich Molecules that are Similar but Improved

Actively Searching: Inverse Design of Novel Molecules with Simultaneously Optimized Properties

Contact Info

Product

Resources

About