Motivation
Prediction of protein complexes from protein–protein interaction (PPI) networks is an important problem in systems biology, as they control different cellular functions. The existing solutions employ algorithms for network community detection that identify dense subgraphs in PPI networks. However, gold standards in yeast and human indicate that protein complexes can also induce sparse subgraphs, introducing further challenges in protein complex prediction.
Results
To address this issue, we formalize protein complexes as biclique spanned subgraphs, which include both sparse and dense subgraphs. We then cast the problem of protein complex prediction as a network partitioning into biclique spanned subgraphs with removal of minimum number of edges, called coherent partition. Since finding a coherent partition is a computationally intractable problem, we devise a parameter-free greedy approximation algorithm, termed Protein Complexes from Coherent Partition (PC2P), based on key properties of biclique spanned subgraphs. Through comparison with nine contenders, we demonstrate that PC2P: (i) successfully identifies modular structure in networks, as a prerequisite for protein complex prediction, (ii) outperforms the existing solutions with respect to a composite score of five performance measures on 75% and 100% of the analyzed PPI networks and gold standards in yeast and human, respectively, and (iii,iv) does not compromise GO semantic similarity and enrichment score of the predicted protein complexes. Therefore, our study demonstrates that clustering of networks in terms of biclique spanned subgraphs is a promising framework for detection of complexes in PPI networks.
Availability and implementation
https://github.com/SaraOmranian/PC2P.
Supplementary information
Supplementary data are available at Bioinformatics online.
Proteins are essential components of all living organisms and participate in almost every biological process. However, most proteins do not function as a single entity; instead, they often interact with other proteins to form large macromolecules, i.e. protein complexes, that are involved in different cellular functions. Identifying protein complexes allows assigning functions to proteins of yet unknown roles by using the known function of their interacting partners, following the principle of guilt-by-association (Tian et al. 2008). Moreover, due to the protein structures, proteins are often involved in more than one complex in different subcellular compartments and biological processes. Therefore, studying protein complexes is important to understand the functional principles of the cell system, from signaling to metabolism (Pawson and Nash 2000;Maslov and Sneppen 2002;Reyes-Turcu et al. 2009;Sweetlove and Fernie 2018), and provide a better understanding hierarchy of intra-and inter-cellular activities (Bauer and Kuster 2003).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.