Poor Generalization by Current Deep Learning Models for Predicting Binding Affinities of Kinase Inhibitors

Ong, Wern Juin Gabriel; Kirubakaran, Palani; Karanicolas, John

doi:10.1101/2023.09.04.556234

Cited by 2 publications

(3 citation statements)

References 63 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This would ensure a comprehensive evaluation, highlighting the strengths and limitations of each method in various contexts. Randomly splitting compound-kinase pairs into training and validation sets results in overoptimistic performance in terms of generalization to previously unseen data, as was also observed in other work [OKK23].…”

Section: Discussionsupporting

confidence: 66%

Leveraging multiple data types for improved compound-kinase bioactivity prediction

Theisen,

Wang,

Ravikumar

et al. 2024

Preprint

View full text Add to dashboard Cite

Machine learning methods offer time- and cost-effective means for identifying novel chemical matter as well as guiding experimental efforts to map enormous compound-kinase interaction spaces. However, considerable challenges for compound-kinase interaction modeling arise from the heterogeneity of available bioactivity readouts, including single-dose compound profiling results, such as percentage inhibition, and multi-dose-response results, such as IC50. Standard activity prediction approaches utilize only dose-response data in the model training, disregarding a substantial portion of available information contained in single-dose measurements. Here, we propose a novel machine learning methodology for compound-kinase activity prediction that leverages both single-dose and dose-response data. Our two-stage model first learns a mapping between single-dose and dose-response bioactivity readouts, and then generates proxy dose-response activity labels for compounds that have only been tested in single-dose assays. The predictions from the first-stage model are then integrated with experimentally measured dose-response activities to model compound-kinase binding based on chemical structures and kinase features. We demonstrate that our two-stage approach yields accurate activity predictions and significantly improves model performance compared to training solely on dose-response labels, particularly in the most practical and challenging scenarios of predicting activities for new compounds and new compound scaffolds. This superior performance is consistent across five evaluated machine learning methods, including traditional models such as random forest and kernel learning, as well as deep learning-based approaches. Using the best performing model, we carried out extensive experimental profiling on a total of 347 selected compound-kinase pairs, achieving a high hit rate of 40% and a negative predictive value of 78%. We show that these rates can be improved further by incorporating model uncertainty estimates into the compound selection process. By integrating multiple activity data types, we demonstrate that our approach holds promise for facilitating the development of training activity datasets in a more efficient and cost-effective way.

show abstract

Section: Discussionsupporting

confidence: 66%

Leveraging multiple data types for improved compound-kinase bioactivity prediction

Theisen,

Wang,

Ravikumar

et al. 2024

Preprint

View full text Add to dashboard Cite

show abstract

“…To avoid this, all 5 models for a given complex were together placed in either the training set or the validation set or the test set [31]. As noted earlier, all five AF2 models for a given complex non-redundant active complexes with experimentally derived structures were obtained from DockGround, and five AF2 models were built from each of these.…”

Section: Dataset For Training/testing Ppiscreenmlmentioning

confidence: 99%

“…The dataset includes 5 AF2 models for each (active or decoy) complex: training on one AF2 model of a given complex then using a different model of the same complex in the test set would introduce obvious information leakage. To avoid this, all 5 models for a given complex were together placed in either the training set or the validation set or the test set [31]. As noted earlier, all five AF2 models for a given complex were included in the validation and testing sets; however, only those AF2 models of active complexes close to the experimentally derived structures were included in the training set ( Figure S2 ).…”

Section: Introductionmentioning

confidence: 99%

PPIscreenML: Structure-based screening for protein-protein interactions using AlphaFold

Mischley,

Maier,

Chen

et al. 2024

Preprint

Self Cite

View full text Add to dashboard Cite

Protein-protein interactions underlie nearly all cellular processes. With the advent of protein structure prediction methods such as AlphaFold2 (AF2), models of specific protein pairs can be built extremely accurately in most cases. However, determining the relevance of a given protein pair remains an open question. It is presently unclear how to use best structure-based tools to infer whether a pair of candidate proteins indeed interact with one another: ideally, one might even use such information to screen amongst candidate pairings to build up protein interaction networks. Whereas methods for evaluating quality of modeled protein complexes have been co-opted for determining which pairings interact (e.g., pDockQ and iPTM), there have been no rigorously benchmarked methods for this task. Here we introduce PPIscreenML, a classification model trained to distinguish AF2 models of interacting protein pairs from AF2 models of compelling decoy pairings. We find that PPIscreenML out-performs methods such as pDockQ and iPTM for this task, and further that PPIscreenML exhibits impressive performance when identifying which ligand/receptor pairings engage one another across the structurally conserved tumor necrosis factor superfamily (TNFSF). Analysis of benchmark results using complexes not seen in PPIscreenML development strongly suggest that the model generalizes beyond training data, making it broadly applicable for identifying new protein complexes based on structural models built with AF2.

show abstract

Poor Generalization by Current Deep Learning Models for Predicting Binding Affinities of Kinase Inhibitors

Cited by 2 publications

References 63 publications

Leveraging multiple data types for improved compound-kinase bioactivity prediction

Leveraging multiple data types for improved compound-kinase bioactivity prediction

PPIscreenML: Structure-based screening for protein-protein interactions using AlphaFold

Contact Info

Product

Resources

About