2021
DOI: 10.48550/arxiv.2112.13168
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

AI-Bind: Improving Binding Predictions for Novel Protein Targets and Ligands

Abstract: Identifying novel drug-target interactions (DTI) is a critical and rate limiting step in drug discovery. While deep learning models have been proposed to accelerate the identification process, we show that state-of-the-art models fail to generalize to novel (i.e., never-before-seen) structures. We first unveil the mechanisms responsible for this shortcoming, demonstrating how models rely on shortcuts that leverage the topology of the protein-ligand bipartite network, rather than learning the node features. The… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 73 publications
0
1
0
Order By: Relevance
“…First, the population distribution is virtually infinite, whereas training distributions are generally small and significantly biased [23]. A common example is binding affinity databases being biased towards certain protein families and high affinity binders [7]. Secondly, individual data points are not independent from each other, as similar molecules share similar properties [44, 41] (e.g., two protein sequences with a sequence identity above 30% usually share a common ancestor and adopt similar structures [35]).…”
Section: Introductionmentioning
confidence: 99%
“…First, the population distribution is virtually infinite, whereas training distributions are generally small and significantly biased [23]. A common example is binding affinity databases being biased towards certain protein families and high affinity binders [7]. Secondly, individual data points are not independent from each other, as similar molecules share similar properties [44, 41] (e.g., two protein sequences with a sequence identity above 30% usually share a common ancestor and adopt similar structures [35]).…”
Section: Introductionmentioning
confidence: 99%