The de novo drug
design based on SMILES format is a typical sequence-processing
problem. Previous methods based on recurrent neural network (RNN)
exhibit limitation in capturing long-range dependency, resulting in
a high invalid percentage in generated molecules. Recent studies have
shown the potential of Transformer architecture to increase the capacity
of handling sequence data. In this work, the encoder module in the
Transformer is used to build a generative model. First, we train a
Transformer-encoder-based generative model to learn the grammatical
rules of known drug molecules and a predictive model to predict the
activity of the molecules. Subsequently, transfer learning and reinforcement
learning were used to fine-tune and optimize the generative model,
respectively, to design new molecules with desirable activity. Compared
with previous RNN-based methods, our method has improved the percentage
of generating chemically valid molecules (from 95.6 to 98.2%), the
structural diversity of the generated molecules, and the feasibility
of molecular synthesis. The pipeline is validated by designing inhibitors
against the human BRAF protein. Molecular docking and binding mode
analysis showed that our method can generate small molecules with
higher activity than those carrying ligands in the crystal structure
and have similar interaction sites with these ligands, which can provide
new ideas and suggestions for pharmaceutical chemists.
Accurate prediction of protein–ligand
interactions can greatly
promote drug development. Recently, a number of deep-learning-based
methods have been proposed to predict protein–ligand binding
affinities. However, these methods independently extract the feature
representations of proteins and ligands but ignore the relative spatial
positions and interaction pairs between them. Here, we propose a virtual
screening method based on deep learning, called Deep Scoring, which
directly extracts the relative position information and atomic attribute
information on proteins and ligands from the docking poses. Furthermore,
we use two Resnets to extract the features of ligand atoms and protein
residues, respectively, and generate an atom–residue interaction
matrix to learn the underlying principles of the interactions between
proteins and ligands. This is then followed by a dual attention network
(DAN) to generate the attention for two related entities (i.e., proteins
and ligands) and to weigh the contributions of each atom and residue
to binding affinity prediction. As a result, Deep Scoring outperforms
other structure-based deep learning methods in terms of screening
performance (area under the receiver operating characteristic curve
(AUC) of 0.901 for an unbiased DUD-E version), pose prediction (AUC
of 0.935 for PDBbind test set), and generalization ability (AUC of
0.803 for the CHEMBL data set). Finally, Deep Scoring was used to
select novel ERK2 inhibitor, and two compounds (D264-0698 and D483-1785)
were obtained with potential inhibitory activity on ERK2 through the
biological experiments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.