The
copper(I)-catalyzed alkyne–azide cycloaddition (CuAAC)
reaction, a major click chemistry reaction, is widely employed in
drug discovery and chemical biology. However, the success rate of
the CuAAC reaction is not satisfactory as expected, and in order to
improve its performance, we developed a recurrent neural network (RNN)
model to predict its feasibility. First, we designed and synthesized
a structurally diverse library of 700 compounds with the CuAAC reaction
to obtain experimental data. Then, using reaction SMILES as input,
we generated a bidirectional long–short-term memory with a
self-attention mechanism (BiLSTM-SA) model. Our best prediction model
has total accuracy of 80%. With the self-attention mechanism, adverse
substructures responsible for negative reactions were recognized and
derived as quantitative descriptors. Density functional theory investigations
were conducted to provide evidence for the correlation between bromo-α-C
hybrid types and the success rate of the reaction. Quantitative descriptors
combined with RDKit descriptors were fed to three machine learning
models, a support vector machine, random forest, and logistic regression,
and resulted in improved performance. The BiLSTM-SA model for predicting
the feasibility of the CuAAC reaction is superior to other conventional
learning methods and advances heuristic chemical rules.