Application of deep learning techniques
for de novo generation of molecules, termed as inverse
molecular design, has
been gaining enormous traction in drug design. The representation
of molecules in SMILES notation as a string of characters enables
the usage of state of the art models in natural language processing,
such as Transformers, for molecular design in general. Inspired by
generative pre-training (GPT) models that have been shown to be successful
in generating meaningful text, we train a transformer-decoder on the
next token prediction task using masked self-attention for the generation
of druglike molecules in this study. We show that our model, MolGPT,
performs on par with other previously proposed modern machine learning
frameworks for molecular generation in terms of generating valid,
unique, and novel molecules. Furthermore, we demonstrate that the
model can be trained conditionally to control multiple properties
of the generated molecules. We also show that the model can be used
to generate molecules with desired scaffolds as well as desired molecular
properties by conditioning the generation on scaffold SMILES strings
of desired scaffolds and property values. Using saliency maps, we
highlight the interpretability of the generative process of the model.
<p>Application of deep learning techniques for the de novo generation of molecules, termed as inverse molecular design, has been gaining enormous traction in drug design. The representation of molecules in SMILES notation as a string of characters enables the usage of state of the art models in Natural Language Processing, such as the Transformers, for molecular design in general. Inspired by Generative Pre-Training (GPT) model that have been shown to be successful in generating meaningful text, we train a Transformer-Decoder on the next token prediction task using masked self-attention for the generation of druglike molecules in this study. We show that our model, LigGPT, outperforms other previously proposed modern machine learning frameworks for molecular generation in terms of generating valid, unique and novel molecules. Furthermore, we demonstrate that the model can be trained conditionally to optimize multiple properties of the generated molecules. We also show that the model can be used to generate molecules with desired scaffolds as well as desired molecular properties, by passing these structures as conditions, which has potential applications in lead optimization in addition to de novo molecular design. Using saliency maps, we highlight the interpretability of the generative process of the model.</p>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.