2023
DOI: 10.1101/2023.01.18.524543
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Cracking the black box of deep sequence-based protein-protein interaction prediction

Abstract: Identifying protein-protein interactions (PPIs) is crucial for deciphering biological pathways and their dysregulation. Numerous prediction methods have been developed as a cheap alternative to biological experiments, reporting phenomenal accuracy estimates. While most methods rely exclusively on sequence information, PPIs occur in 3D space. As predicting protein structure from sequence is an infamously complex problem, the almost perfect reported performances for PPI prediction seem dubious. We systematically… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
36
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 13 publications
(37 citation statements)
references
References 66 publications
1
36
0
Order By: Relevance
“…Studies by Dunham and Ganapathiraju in 2021 and Bernett et al . in 2023 corroborate Park and Marcotte’s 2012 findings (7, 8). In both studies, contemporary sequence-based PPI inference methods were tested on datasets that satisfy Park and Marcotte’s strictest validation schemes.…”
Section: Introductionsupporting
confidence: 80%
See 1 more Smart Citation
“…Studies by Dunham and Ganapathiraju in 2021 and Bernett et al . in 2023 corroborate Park and Marcotte’s 2012 findings (7, 8). In both studies, contemporary sequence-based PPI inference methods were tested on datasets that satisfy Park and Marcotte’s strictest validation schemes.…”
Section: Introductionsupporting
confidence: 80%
“…Furthermore, work by Hamp and Rost as well as Bernett et al . underscore an additional source of data leakage: distinct proteins with sequences which are nearly identical (8, 9). Amino acid sequences can be overwhelmingly redundant between distinct proteins.…”
Section: Introductionmentioning
confidence: 99%
“…On the other hand, sequence-based models have millions of parameters which give them the flexibility to recognise individual proteins and learn specific interaction patterns. Although this enables such models to make predictions without functional information, it also limits high performance to proteins present in the training set, and make them particularly susceptible to data leakage [17]. This likely explains the poor results of sequence models on previously unseen proteins and cross-species datasets, something also observed by Dunham et al in their benchmarking effort [16].…”
Section: Discussionmentioning
confidence: 99%
“…Yet, despite a wealth of tools, the mechanics and consequences of the underlying inference are still poorly understood, and it is unclear why models with similar performance make vastly different predictions. Reported performance scores often cannot be compared or replicated due to proprietary data and inconsistent or flawed assessment methods [16], [17]. This prompted recent efforts to benchmark published PPI prediction models more rigorously using common datasets and testing strategies [16], [18].…”
Section: Introductionmentioning
confidence: 99%
“…At the same time, single-sequence methods (i.e. those not dependent on an MSA) for PPI prediction have been found to be unreliable [7]. In contrast to other methods, AF for single chains has not been trained on PPIs and, therefore, successful PPI prediction indicates that the network generalizes to this new task.…”
Section: Mainmentioning
confidence: 99%