Data-driven computer-aided
synthesis planning utilizing organic
or biocatalyzed reactions from large databases has gained increasing
interest in the last decade, sparking the development of numerous
tools to extract, apply, and score general reaction templates. The
generation of reaction rules for enzymatic reactions is especially
challenging since substrate promiscuity varies between enzymes, causing
the optimal levels of rule specificity and optimal number of included
atoms to differ between enzymes. This complicates an automated extraction
from databases and has promoted the creation of manually curated reaction
rule sets. Here, we present EHreact, a purely data-driven open-source
software tool, to extract and score reaction rules from sets of reactions
known to be catalyzed by an enzyme at appropriate levels of specificity
without expert knowledge. EHreact extracts and groups reaction rules
into tree-like structures, Hasse diagrams, based on common substructures
in the imaginary transition structures. Each diagram can be utilized
to output a single or a set of reaction rules, as well as calculate
the probability of a new substrate to be processed by the given enzyme
by inferring information about the reactive site of the enzyme from
the known reactions and their grouping in the template tree. EHreact
heuristically predicts the activity of a given enzyme on a new substrate,
outperforming current approaches in accuracy and functionality.