Protein pKa prediction by tree-based machine learning

Chen, Ada Y.; Lee, Juyong; Damjanović, Ana; Brooks, Bernard R.

doi:10.26434/chemrxiv-2021-4d420

Cited by 2 publications

(3 citation statements)

References 107 publications

(148 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As IEF is still used for separation of modified peptides there is a potential to develop models that can predict the pI of modified peptides. For protein-level IEF under native conditions, a methodology similar to that recently used to calculate protein pK a values 73 using AlphaFold may be adopted.…”

Section: ■ Enzymatic Digestionmentioning

confidence: 99%

Toward an Integrated Machine Learning Model of a Proteomics Experiment

et al. 2023

View full text Add to dashboard Cite

In recent years machine learning has made extensive progress in modeling many aspects of mass spectrometry data. We brought together proteomics data generators, repository managers, and machine learning experts in a workshop with the goals to evaluate and explore machine learning applications for realistic modeling of data from multidimensional mass spectrometry-based proteomics analysis of any sample or organism. Following this sample-to-data roadmap helped identify knowledge gaps and define needs. Being able to generate bespoke and realistic synthetic data has legitimate and important uses in system suitability, method development, and algorithm benchmarking, while also posing critical ethical questions. The interdisciplinary nature of the workshop informed discussions of what is currently possible and future opportunities and challenges. In the following perspective we summarize these discussions in the hope of conveying our excitement about the potential of machine learning in proteomics and to inspire future research.

show abstract

Section: ■ Enzymatic Digestionmentioning

confidence: 99%

Toward an Integrated Machine Learning Model of a Proteomics Experiment

et al. 2023

View full text Add to dashboard Cite

show abstract

“…Chen et al trained tree-based machine learning models, such as XGBoost or LightGBM, on experimental data, and their best model exhibited an RMSE of 0.69. 30 To compare pKAI with these models and illustrate the data leakage problem at hand, we have refined our pKAI model by training it on same data split reported in ref 30. This new model seems to have an unparalleled performance (RMSE of 0.32 and MAE of 0.21).…”

Section: Journal Of Chemicalmentioning

confidence: 99%

“…5 Recently, traditional ML models have been trained on ∼1500 experimental pK a values. 29,30 However, testing the real-world performances of such methods is difficult, as there is a high degree of similarity among available experimental data. Our larger data set translates into more diversity in terms of protein and residue types and, more importantly, a wider variety of residue environments.…”

Section: ■ Introductionmentioning

confidence: 99%

A Fast and Interpretable Deep Learning Approach for Accurate Electrostatics-Driven pK_a Predictions in Proteins

Reis

Bertolini

Montanari

et al. 2022

J. Chem. Theory Comput.

View full text Add to dashboard Cite

Existing computational methods to estimate pK a values in proteins rely on theoretical approximations and lengthy computations. In this work, we use a data set of 6 million theoretically determined pK a shifts to train deep learning models that are shown to rival the physics-based predictors. These neural networks managed to assign proper electrostatic charges to chemical groups, and learned the importance of solvent exposure and close interactions, including hydrogen bonds. Although trained only using theoretical data, our pKAI+ model displays the best accuracy on a test set of ∼750 experimental values. Inference times allow speedups of more than 1000 times faster than physics-based methods.By combining speed, accuracy and a reasonable understanding of the underlying physics, our models provide a game-changing solution for fast estimations of macroscopic pK a from ensembles of microscopic values as well as for many downstream applications such as molecular docking and constant-pH molecular dynamics simulations. MainMany biological processes are triggered by changes in the ionization state of key amino acid side-chains 1, 2 .Experimentally, the titration behavior of a molecule can be measured using potentiometry or by tracking free energy changes across a pH range. For individual sites, titration curves can be derived from infrared

show abstract

Protein pKa prediction by tree-based machine learning

Cited by 2 publications

References 107 publications

Toward an Integrated Machine Learning Model of a Proteomics Experiment

Toward an Integrated Machine Learning Model of a Proteomics Experiment

A Fast and Interpretable Deep Learning Approach for Accurate Electrostatics-Driven pK_a Predictions in Proteins

Contact Info

Product

Resources

About

Protein pKa prediction by tree-based machine learning

Cited by 2 publications

References 107 publications

Toward an Integrated Machine Learning Model of a Proteomics Experiment

Toward an Integrated Machine Learning Model of a Proteomics Experiment

A Fast and Interpretable Deep Learning Approach for Accurate Electrostatics-Driven pKa Predictions in Proteins

Contact Info

Product

Resources

About

A Fast and Interpretable Deep Learning Approach for Accurate Electrostatics-Driven pK_a Predictions in Proteins