In recent years, deep molecular generative
models have emerged
as promising methods for de novo molecular design.
Thanks to the rapid advance of deep learning techniques, deep learning
architectures such as recurrent neural networks, variational autoencoders,
and adversarial networks have been successfully employed for constructing
generative models. Recently, quite a few metrics have been proposed
to evaluate these deep generative models. However, many of these metrics
cannot evaluate the chemical space coverage of sampled molecules.
This work presents a novel and complementary metric for evaluating
deep molecular generative models. The metric is based on the chemical
space coverage of a reference datasetGDB-13. The performance
of seven different molecular generative models was compared by calculating
what fraction of the structures, ring systems, and functional groups
could be reproduced from the largely unseen reference set when using
only a small fraction of GDB-13 for training. The results show that
the performance of the generative models studied varies significantly
using the benchmark metrics introduced herein, such that the generalization
capabilities of the generative models can be clearly differentiated.
In addition, the coverages of GDB-13 ring systems and functional groups
were compared between the models. Our study provides a useful new
metric that can be used for evaluating and comparing generative models.