Existing computational methods to estimate pK a values in proteins rely on theoretical approximations and lengthy computations. In this work, we use a data set of 6 million theoretically determined pK a shifts to train deep learning models that are shown to rival the physics-based predictors. These neural networks managed to assign proper electrostatic charges to chemical groups, and learned the importance of solvent exposure and close interactions, including hydrogen bonds. Although trained only using theoretical data, our pKAI+ model displays the best accuracy on a test set of ∼750 experimental values. Inference times allow speedups of more than 1000 times faster than physics-based methods.By combining speed, accuracy and a reasonable understanding of the underlying physics, our models provide a game-changing solution for fast estimations of macroscopic pK a from ensembles of microscopic values as well as for many downstream applications such as molecular docking and constant-pH molecular dynamics simulations.
MainMany biological processes are triggered by changes in the ionization state of key amino acid side-chains 1, 2 .Experimentally, the titration behavior of a molecule can be measured using potentiometry or by tracking free energy changes across a pH range. For individual sites, titration curves can be derived from infrared