Cracking the black box of deep sequence-based protein-protein interaction prediction

Bernett, Judith; Blumenthal, David B.; List, Markus

doi:10.1101/2023.01.18.524543

Cited by 13 publications

(37 citation statements)

References 66 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Studies by Dunham and Ganapathiraju in 2021 and Bernett et al . in 2023 corroborate Park and Marcotte’s 2012 findings (7, 8). In both studies, contemporary sequence-based PPI inference methods were tested on datasets that satisfy Park and Marcotte’s strictest validation schemes.…”

Section: Introductionsupporting

confidence: 80%

See 1 more Smart Citation

INTREPPPID - An Orthologue-Informed Quintuplet Network for Cross-Species Prediction of Protein-Protein Interaction

Szymborski,

Emad

2024

Preprint

View full text Add to dashboard Cite

An overwhelming majority of protein-protein interaction (PPI) studies are conducted in a select few model organisms largely due to constraints in time and cost of the associated "wet lab" experiments. In silico PPI inference methods are ideal tools to overcome these limitations, but often struggle with cross-species predictions. We present INTREPPPID, a method which incorporates orthology data using a new "quintuplet" neural network, which is constructed with five parallel encoders with shared parameters. INTREPPPID incorporates both a PPI classification task and an orthologous locality task. The latter learns embeddings of orthologues that have small Euclidean distances between them and large distances between embeddings of all other proteins. INTREPPPID outperforms all other leading PPI inference methods tested on both the intra-species and cross-species tasks using strict evaluation datasets. We show that INTREPPPID's orthologous locality loss increases performance because of the biological relevance of the orthologue data, and not due to some other specious aspect of the architecture. Finally, we introduce PPI.bio and PPI Origami, a web server interface for INTREPPPID and a software tool for creating strict evaluation datasets, respectively. Together, these two initiatives aim to make both the use and development of PPI inference tools more accessible to the community.

show abstract

Section: Introductionsupporting

confidence: 80%

“…Furthermore, work by Hamp and Rost as well as Bernett et al . underscore an additional source of data leakage: distinct proteins with sequences which are nearly identical (8, 9). Amino acid sequences can be overwhelmingly redundant between distinct proteins.…”

Section: Introductionmentioning

confidence: 99%

INTREPPPID - An Orthologue-Informed Quintuplet Network for Cross-Species Prediction of Protein-Protein Interaction

Szymborski,

Emad

2024

Preprint

View full text Add to dashboard Cite

show abstract

“…On the other hand, sequence-based models have millions of parameters which give them the flexibility to recognise individual proteins and learn specific interaction patterns. Although this enables such models to make predictions without functional information, it also limits high performance to proteins present in the training set, and make them particularly susceptible to data leakage [17]. This likely explains the poor results of sequence models on previously unseen proteins and cross-species datasets, something also observed by Dunham et al in their benchmarking effort [16].…”

Section: Discussionmentioning

confidence: 99%

“…Yet, despite a wealth of tools, the mechanics and consequences of the underlying inference are still poorly understood, and it is unclear why models with similar performance make vastly different predictions. Reported performance scores often cannot be compared or replicated due to proprietary data and inconsistent or flawed assessment methods [16], [17]. This prompted recent efforts to benchmark published PPI prediction models more rigorously using common datasets and testing strategies [16], [18].…”

Section: Introductionmentioning

confidence: 99%

Pitfalls of machine learning models for protein-protein interactions

Lannelongue

Inouye

2022

Preprint

View full text Add to dashboard Cite

Protein-protein interactions (PPIs) are essential to understanding biological pathways as well as their roles in development and disease. Computational tools have been successful at predicting PPIs in silico, but the lack of consistent and reliable frameworks for this task has led to network models that are difficult to compare and, overall, a low level of trust in the PPI predictions. To better understand the underlying mechanisms that underpin these models, we designed B4PPI, an open-source framework for benchmarking that accounts for a range of biological and statistical pitfalls while facilitating reproducibility. We use B4PPI to shed light on the impact of network topology and how different algorithms deal with highly connected proteins. By studying functional genomics-based and sequence-based models (the two most popular approaches) on human PPIs, we show their complementarity as the former performs best on lone proteins while the latter specialises in interactions involving hubs. We also show that algorithm design has little impact on performance with functional genomic data. We replicate our results between both human and S. Cerevisiae data and demonstrate that models using functional genomics are better suited to PPI prediction across species. With rapidly increasing amounts of sequence and functional genomics data, our study provides a systematic foundation for future construction, comparison and application of PPI networks.

show abstract

“…At the same time, single-sequence methods (i.e. those not dependent on an MSA) for PPI prediction have been found to be unreliable [7]. In contrast to other methods, AF for single chains has not been trained on PPIs and, therefore, successful PPI prediction indicates that the network generalizes to this new task.…”

Section: Mainmentioning

confidence: 99%

Rapid protein-protein interaction network creation from multiple sequence alignments with Deep Learning

Bryant

Noé

2023

Preprint

View full text Add to dashboard Cite

AlphaFold2 (AF) can evaluate protein-protein interactions (PPIs) with high accuracy by finding evolutionary signals between proteins but comes with a high computational cost. Here, we speed up the prediction with AF for PPI network prediction 40x and reduce the disk space requirements 4000x for a set of 1000 proteins. Our protocol is easy to install and freely available from: https://github.com/patrickbryant1/SpeedPPI.

show abstract

Cracking the black box of deep sequence-based protein-protein interaction prediction

Cited by 13 publications

References 66 publications

INTREPPPID - An Orthologue-Informed Quintuplet Network for Cross-Species Prediction of Protein-Protein Interaction

INTREPPPID - An Orthologue-Informed Quintuplet Network for Cross-Species Prediction of Protein-Protein Interaction

Pitfalls of machine learning models for protein-protein interactions

Rapid protein-protein interaction network creation from multiple sequence alignments with Deep Learning

Contact Info

Product

Resources

About