2021
DOI: 10.1021/acs.jcim.1c00889
|View full text |Cite
|
Sign up to set email alerts
|

Active Site Sequence Representations of Human Kinases Outperform Full Sequence Representations for Affinity Prediction and Inhibitor Generation: 3D Effects in a 1D Model

Abstract: Recent advances in deep learning have enabled the development of large-scale multimodal models for virtual screening and de novo molecular design. The human kinome with its abundant sequence and inhibitor data presents an attractive opportunity to develop proteochemometric models that exploit the size and internal diversity of this family of targets. Here, we challenge a standard practice in sequence-based affinity prediction models: instead of leveraging the full primary structure of proteins, each target is … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
35
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 20 publications
(36 citation statements)
references
References 84 publications
0
35
0
Order By: Relevance
“…The experimental setup is largely identical to the binding affinity prediction task described in ref ( 7 ). We take data from BindingDB 11 and examine two types of models, a k -nearest-neighbor (KNN) model that builds a joint similarity space of protein and ligand distances and a deep neural network called BiMCA (Bimodal Multiscale Convolutional Attention encoder 12 ) that ingests protein and ligand sequences (SMILES strings) and consists of convolutional and attention layers.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…The experimental setup is largely identical to the binding affinity prediction task described in ref ( 7 ). We take data from BindingDB 11 and examine two types of models, a k -nearest-neighbor (KNN) model that builds a joint similarity space of protein and ligand distances and a deep neural network called BiMCA (Bimodal Multiscale Convolutional Attention encoder 12 ) that ingests protein and ligand sequences (SMILES strings) and consists of convolutional and attention layers.…”
Section: Methodsmentioning
confidence: 99%
“…In our previous work, 7 the active site representation relied on 29 residues defined originally in Sheridan et al [ref ( 8 ), Table 1]. These residues are short contiguous subsequences that lie discontiguously in the original sequence (cf.…”
Section: Kinase Sequence Representationmentioning
confidence: 99%
See 1 more Smart Citation
“…This is consistent with previous studies, which showed that the use of active site sequences can improve the prediction of the affinity. 68 , 69 For the occurrence of amino acids in the protein active site, it can be seen that His, Gly, Tyr, and Trp rank in the top position ( Figure S3 ), probably because they can form hydrogen bonds with ligands and thus play key roles in ligand binding. Leu and Phe also rank quite high, possibly due to their contribution to the formation of hydrophobic pockets.…”
Section: Resultsmentioning
confidence: 99%
“…All the datasets in this work are derived from BindingDB (Liu et al, 2007): a publicly accessible and regularly updated collection of binding affinity values between proteins considered to be drugtargets, and drug-like molecules. In particular, we adopt two benchmark datasets derived from BindingDB, one released by Yingkai Gao et al (2018) and the other as defined by Karimi et al (2019), which have been used for benchmarking recent DTI predictors (Chen et al, 2020; Born et al, 2022). Both benchmark datasets are outlined in Table 1.…”
Section: Methodsmentioning
confidence: 99%