Determining the redox
potentials of protein cofactors
and how they
are influenced by their molecular neighborhoods is essential for basic
research and many biotechnological applications, from biosensors and
biocatalysis to bioremediation and bioelectronics. The laborious determination
of redox potential with current experimental technologies pushes forward
the need for computational approaches that can reliably predict it.
Although current computational approaches based on quantum and molecular
mechanics are accurate, their large computational costs hinder their
usage. In this work, we explored the possibility of using more efficient
QSPR models based on machine learning (ML) for the prediction of protein
redox potential, as an alternative to classical approaches. As a proof
of concept, we focused on flavoproteins, one of the most important
families of enzymes directly involved in redox processes. To train
and test different ML models, we retrieved a dataset of flavoproteins
with a known midpoint redox potential (E
m) and 3D structure. The features of interest, accounting for both
short- and long-range effects of the protein matrix on the flavin
cofactor, have been automatically extracted from each protein PDB
file. Our best ML model (XGB) has a performance error below 1 kcal/mol
(∼36 mV), comparing favorably to more sophisticated computational
approaches. We also provided indications on the features that mostly
affect the E
m value, and when possible,
we rationalized them on the basis of previous studies.