We report a method
to convert discrete representations of molecules
to and from a multidimensional continuous representation. This model
allows us to generate new molecules for efficient exploration and
optimization through open-ended spaces of chemical compounds. A deep
neural network was trained on hundreds of thousands of existing chemical
structures to construct three coupled functions: an encoder, a decoder,
and a predictor. The encoder converts the discrete representation
of a molecule into a real-valued continuous vector, and the decoder
converts these continuous vectors back to discrete molecular representations.
The predictor estimates chemical properties from the latent continuous
vector representation of the molecule. Continuous representations
of molecules allow us to automatically generate novel chemical structures
by performing simple operations in the latent space, such as decoding
random vectors, perturbing known chemical structures, or interpolating
between molecules. Continuous representations also allow the use of
powerful gradient-based optimization to efficiently guide the search
for optimized functional compounds. We demonstrate our method in the
domain of drug-like molecules and also in a set of molecules with
fewer that nine heavy atoms.
Virtual screening is becoming a ground-breaking tool for molecular discovery due to the exponential growth of available computer time and constant improvement of simulation and machine learning techniques. We report an integrated organic functional material design process that incorporates theoretical insight, quantum chemistry, cheminformatics, machine learning, industrial expertise, organic synthesis, molecular characterization, device fabrication and optoelectronic testing. After exploring a search space of 1.6 million molecules and screening over 400,000 of them using time-dependent density functional theory, we identified thousands of promising novel organic light-emitting diode molecules across the visible spectrum. Our team collaboratively selected the best candidates from this set. The experimentally determined external quantum efficiencies for these synthesized candidates were as large as 22%.
Reaction prediction remains one of
the major challenges for organic
chemistry and is a prerequisite for efficient synthetic planning.
It is desirable to develop algorithms that, like humans, “learn”
from being exposed to examples of the application of the rules of
organic chemistry. We explore the use of neural networks for predicting
reaction types, using a new reaction fingerprinting method. We combine
this predictor with SMARTS transformations to build a system which,
given a set of reagents and reactants, predicts the likely products.
We test this method on problems from a popular organic chemistry textbook.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.