The de novo drug
design based on SMILES format is a typical sequence-processing
problem. Previous methods based on recurrent neural network (RNN)
exhibit limitation in capturing long-range dependency, resulting in
a high invalid percentage in generated molecules. Recent studies have
shown the potential of Transformer architecture to increase the capacity
of handling sequence data. In this work, the encoder module in the
Transformer is used to build a generative model. First, we train a
Transformer-encoder-based generative model to learn the grammatical
rules of known drug molecules and a predictive model to predict the
activity of the molecules. Subsequently, transfer learning and reinforcement
learning were used to fine-tune and optimize the generative model,
respectively, to design new molecules with desirable activity. Compared
with previous RNN-based methods, our method has improved the percentage
of generating chemically valid molecules (from 95.6 to 98.2%), the
structural diversity of the generated molecules, and the feasibility
of molecular synthesis. The pipeline is validated by designing inhibitors
against the human BRAF protein. Molecular docking and binding mode
analysis showed that our method can generate small molecules with
higher activity than those carrying ligands in the crystal structure
and have similar interaction sites with these ligands, which can provide
new ideas and suggestions for pharmaceutical chemists.