Detecting known protein complexes and predicting undiscovered protein complexes from protein-protein interaction (PPI) networks help us to understand principles of cell organization and its functions. Nevertheless, the discovery of protein complexes based on experiment still needs to be explored. Therefore, computational methods are useful approaches to overcome the experimental limitations. Nevertheless, extraction of protein complexes from PPI network is often nontrivial. Two major constraints are large amount of noise and ignorance of occurrence time of different interactions in PPI network. In this paper, an efficient algorithm, Inter Module Hub Removal Clustering (IMHRC), is developed based on inter-module hub removal in the weighted PPI network which can detect overlapped complexes. By removing some of the inter-module hubs and module hubs, IMHRC eliminates high amount of noise in dataset and implicitly considers different occurrence time of the PPI in network. The performance of the IMHRC was evaluated on several benchmark datasets and results were compared with some of the state-of-the-art models. The protein complexes discovered with the IMHRC method show significantly better agreement with the real complexes than other current methods. Our algorithm provides an accurate and scalable method for detecting and predicting protein complexes from PPI networks.
Methods for detecting protein complexes from protein-protein interaction networks are of the most critical computational approaches. Numerous methods have been proposed in this area. Therefore, it is necessary to evaluate them. Various metrics have been proposed in order to compare these methods. Nevertheless, it is essential to define new metrics that evaluate methods both qualitatively and quantitatively. In addition, there is no tool for the comprehensive comparison of such methods. In this paper, a new criterion is introduced that can fully evaluate protein complex detection algorithms. We introduce CDAP (Complex Detection Analyzer Package); an online package for comparing protein complex detection methods. CDAP can quickly rank the performance of methods based on previously defined as well as newly introduced criteria in various settings (4 PPI datasets and 3 gold standards). It has the capability of integrating various methods and apply several filterings on the results. CDAP can be easily extended to include new datasets, gold standards, and methods. Furthermore, the user can compare the results of a custom method with the results of existing methods. Thus, the authors of future papers can use CDAP for comparing their method with the previous ones. A case study is done on YGR198W, a well-known protein, and the detected clusters are compared to the known complexes of this protein.
Background While the evolutionary divergence of cis-regulatory sequences impacts translation initiation sites (TISs), the implication of tandem repeats (TRs) in TIS selection remains largely elusive. Here, we employed the TIS homology concept to study a possible link between TRs of all core lengths and repeats with TISs. Methods Human, as reference sequence, and 83 other species were selected, and data was extracted on the entire protein-coding genes (n = 1,611,368) and transcripts (n = 2,730,515) annotated for those species from Ensembl 102. Following TIS identification, two different weighing vectors were employed to assign TIS homology, and the co-occurrence pattern of TISs with the upstream flanking TRs was studied in the selected species. The results were assessed in 10-fold cross-validation. Results On average, every TIS was flanked by 1.19 TRs of various categories within its 120 bp upstream sequence, per species. We detected statistically significant enrichment of non-homologous human TISs co-occurring with human-specific TRs. On the contrary, homologous human TISs co-occurred significantly with non-human-specific TRs. 2991 human genes had at least one transcript, TIS of which was flanked by a human-specific TR. Text mining of a number of the identified genes, such as CACNA1A, EIF5AL1, FOXK1, GABRB2, MYH2, SLC6A8, and TTN, yielded predominant expression and functions in the human brain and/or skeletal muscle. Conclusion We conclude that TRs ubiquitously flank and contribute to TIS selection at the trans-species level. Future functional analyses, such as a combination of genome editing strategies and in vitro protein synthesis may be employed to further investigate the impact of TRs on TIS selection.
Background: While the evolutionary divergence of cis-regulatory sequences impacts translation initiation sites (TISs), the implication of tandem repeats (TRs) in TIS selection remains elusive for the most part. Hence, here we employed the TIS homology concept to study the co-occurrence patterns of all categories of TRs with TISs. Methods: Human, as reference sequence, and 83 other species were selected, and data was extracted on the entire protein-coding genes (n=1,611,368) and transcripts (n=2,730,515) annotated for those species from Ensembl 102. Two different weighing vectors were employed to assign TIS homology, and the results were assessed in 10-fold validation. Results: On average, every TIS was flanked by 1.19 TRs of various categories within the 120 bp upstream sequence. We detected statistically significant enrichment of non-homologous TISs co-occurring with human-specific TRs. On the contrary, homologous TISs co-occurred significantly with non-human-specific TRs. 2,991 human genes had at least one transcript flanked by a human-specific TR in their upstream flanking region, and nervous system development was the top enriched ontology term across those genes. Text mining of a number of the identified genes such as MYH2, TTN, SLC6A8, CACNA1A, and EIF5AL1 yielded predominant expression and functions in the human brain and skeletal muscle. Conclusion: We conclude that TRs are abundant cis elements in the upstream sequences of TISs across species, and there may be a link between all categories of TRs and TIS selection. The potential biological consequences of this link are discussed in human as the reference sequence in this study.
BackgroundWhile of predominant abundance across vertebrate genomes and significant biological implications, the relevance of short tandem repeat (STR) abundance to speciation remains largely elusive and attributed to random coincidence for the most part. In a model study, here we collected whole-genome abundance of mono-, di-, and trinucleotide STRs in nine species, encompassing rodents and primates, including rat, mouse, olive baboon, gelada, macaque, gorilla, chimpanzee, bonobo, and human. The obtained unnormalized and normalized data were used to analyze hierarchical clustering of the STR abundances in the selected species.ResultsWe found massive differential abundances between the rodent and primate orders. In addition, while numerous STRs had random abundance across the nine selected species, the global abundance conformed to three consistent <clusters>, as follows: <rat, mouse>, <gelada, macaque, olive baboon>, <gorilla, chimpanzee, bonobo, human>, which coincided with the phylogenetic distances of the selected species (p< 4E-05). Exceptionally, in the trinucleotide STR compartment, human was significantly distant from all other species.ConclusionWe propose that the global abundance of STRs is non-random in rodents and primates, and probably had a determining impact on the speciation of the two orders. We also propose the STRs and STR lengths which specifically coincided with the phylogeny of the selected species.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.