Accurate estimation of the pK
a’s
of cysteine residues in proteins could inform targeted approaches
in hit discovery. The pK
a of a targetable
cysteine residue in a disease-related protein is an important physiochemical
parameter in covalent drug discovery, as it influences the fraction
of nucleophilic thiolate amenable to chemical protein modification.
Traditional structure-based in silico tools are limited
in their predictive accuracy of cysteine pK
a’s relative to other titratable residues. Additionally, there
are limited comprehensive benchmark assessments for cysteine pK
a predictive tools. This raises the need for
extensive assessment and evaluation of methods for cysteine pK
a prediction. Here, we report the performance
of several computational pK
a methods,
including single-structure and ensemble-based approaches, on a diverse
test set of experimental cysteine pK
a’s
retrieved from the PKAD database. The dataset consisted of 16 wildtype
and 10 mutant proteins with experimentally measured cysteine pK
a values. Our results highlight that these methods
are varied in their overall predictive accuracies. Among the test
set of wildtype proteins evaluated, the best method (MOE) yielded
a mean absolute error of 2.3 pK units, highlighting
the need for improvement of existing pK
a methods for accurate cysteine pK
a estimation.
Given the limited accuracy of these methods, further development is
needed before these approaches can be routinely employed to drive
design decisions in early drug discovery efforts.