<p>The use of enzymes for organic synthesis allows for simplified, more economical
and selective synthetic routes not accessible to conventional reagents. However,
predicting whether a particular molecule might undergo a specific enzyme
transformation is very difficult. <a>Here we used
multi-task transfer learning to train the Molecular Transformer, a
sequence-to-sequence machine learning model, with one million reactions from
the US Patent Office (USPTO) database combined with 32,181 enzymatic
transformations annotated with a text description of the enzyme. The resulting Enzymatic
Transformer model predicts the structure and stereochemistry of
enzyme-catalyzed reaction products with remarkable accuracy. One of the key
novelties is that we combined the reaction SMILES language of only 405 atomic
tokens with thousands of human language tokens describing the enzymes, such
that our Enzymatic Transformer not only learned to interpret SMILES, but also the
natural language as used by human experts to describe enzymes and their
mutations.</a></p>