The growing capabilities of synthetic biology and organic
chemistry
demand tools to guide syntheses toward useful molecules. Here, we
present Molecular AutoenCoding Auto-Workaround (MACAW), a tool that
uses a novel approach to generate molecules predicted to meet a desired
property specification (e.g., a binding affinity of 50 nM or an octane
number of 90). MACAW describes molecules by embedding them into a
smooth multidimensional numerical space, avoiding uninformative dimensions
that previous methods often introduce. The coordinates in this embedding
provide a natural choice of features for accurately predicting molecular
properties, which we demonstrate with examples for cetane and octane
numbers, flash points, and histamine H1 receptor binding affinity.
The approach is computationally efficient and well-suited to the small-
and medium-size datasets commonly used in biosciences. We showcase
the utility of MACAW for virtual screening by identifying molecules
with high predicted binding affinity to the histamine H1 receptor
and limited affinity to the muscarinic M2 receptor, which are targets
of medicinal relevance. Combining these predictive capabilities with
a novel generative algorithm for molecules allows us to recommend
molecules with a desired property value (i.e., inverse molecular design).
We demonstrate this capability by recommending molecules with predicted
octane numbers of 40, 80, and 120, which is an important characteristic
of biofuels. Thus, MACAW augments classical retrosynthesis tools by
providing recommendations for molecules on specification.