Generative machine
learning models have become widely adopted in
drug discovery and other fields to produce new molecules and explore
molecular space, with the goal of discovering novel compounds with
optimized properties. These generative models are frequently combined
with transfer learning or scoring of the physicochemical properties
to steer generative design, yet often, they are not capable of addressing
a wide variety of potential problems, as well as converge into similar
molecular space when combined with a scoring function for the desired
properties. In addition, these generated compounds may not be synthetically
feasible, reducing their capabilities and limiting their usefulness
in real-world scenarios. Here, we introduce a suite of automated tools
called MegaSyn representing three components: a new hill-climb algorithm,
which makes use of SMILES-based recurrent neural network (RNN) generative
models, analog generation software, and retrosynthetic analysis coupled
with fragment analysis to score molecules for their synthetic feasibility.
We show that by deconstructing the targeted molecules and focusing
on substructures, combined with an ensemble of generative models,
MegaSyn generally performs well for the specific tasks of generating
new scaffolds as well as targeted analogs, which are likely synthesizable
and druglike. We now describe the development, benchmarking, and testing
of this suite of tools and propose how they might be used to optimize
molecules or prioritize promising lead compounds using these RNN examples
provided by multiple test case examples.