BackgroundHigh-throughput molecular interaction data have been used effectively to prioritize candidate genes that are linked to a disease, based on the observation that the products of genes associated with similar diseases are likely to interact with each other heavily in a network of protein-protein interactions (PPIs). An important challenge for these applications, however, is the incomplete and noisy nature of PPI data. Information flow based methods alleviate these problems to a certain extent, by considering indirect interactions and multiplicity of paths.ResultsWe demonstrate that existing methods are likely to favor highly connected genes, making prioritization sensitive to the skewed degree distribution of PPI networks, as well as ascertainment bias in available interaction and disease association data. Motivated by this observation, we propose several statistical adjustment methods to account for the degree distribution of known disease and candidate genes, using a PPI network with associated confidence scores for interactions. We show that the proposed methods can detect loosely connected disease genes that are missed by existing approaches, however, this improvement might come at the price of more false negatives for highly connected genes. Consequently, we develop a suite called DADA, which includes different uniform prioritization methods that effectively integrate existing approaches with the proposed statistical adjustment strategies. Comprehensive experimental results on the Online Mendelian Inheritance in Man (OMIM) database show that DADA outperforms existing methods in prioritizing candidate disease genes.ConclusionsThese results demonstrate the importance of employing accurate statistical models and associated adjustment methods in network-based disease gene prioritization, as well as other network-based functional inference applications. DADA is implemented in Matlab and is freely available at http://compbio.case.edu/dada/.
Genome-wide linkage and association studies have demonstrated promise in identifying genetic factors that influence health and disease. An important challenge is to narrow down the set of candidate genes that are implicated by these analyses. Protein-protein interaction (PPI) networks are useful in extracting the functional relationships between known disease and candidate genes, based on the principle that products of genes implicated in similar diseases are likely to exhibit significant connectivity/proximity. Information flow-based methods are shown to be very effective in prioritizing candidate disease genes. In this article, we utilize the topology of PPI networks to infer functional information in the context of disease association. Our approach is based on the assumption that PPI networks are organized into recurrent schemes that underlie the mechanisms of cooperation among different proteins. We hypothesize that proteins associated with similar diseases would exhibit similar topological characteristics in PPI networks. Utilizing the location of a protein in the network with respect to other proteins (i.e., the ''topological profile'' of the proteins), we develop a novel measure to assess the topological similarity of proteins in a PPI network. We then use this measure to prioritize candidate disease genes based on the topological similarity of their products and the products of known disease genes. We test the resulting algorithm, Vavien, via systematic experimental studies using an integrated human PPI network and the Online Mendelian Inheritance in Man (OMIM) database. Vavien outperforms other network-based prioritization algorithms as shown in the results and is available at www.diseasegenes.org.
Network-based analyses are commonly used as powerful tools to interpret the findings of genome-wide association studies (GWAS) in a functional context. In particular, identification of disease-associated functional modules, i.e., highly connected protein-protein interaction (PPI) subnetworks with high aggregate disease association, are shown to be promising in uncovering the functional relationships among genes and proteins associated with diseases. An important issue in this regard is the scoring of subnetworks by integrating two quantities: disease association of individual gene products and network connectivity among proteins. Current scoring schemes either disregard the level of connectivity and focus on the aggregate disease association of connected proteins or use a linear combination of these two quantities. However, such scoring schemes may produce arbitrarily large subnetworks which are often not statistically significant or require tuning of parameters that are used to weigh the contributions of network connectivity and disease association.Here, we propose a parameter-free scoring scheme that aims to score subnetworks by assessing the disease association of interactions between pairs of gene products. We also incorporate the statistical significance of network connectivity and disease association into the scoring function. We test the proposed scoring scheme on a GWAS dataset for two complex diseases type II diabetes (T2D) and psoriasis (PS). Our results suggest that subnetworks identified by commonly used methods may fail tests of statistical significance after correction for multiple hypothesis testing. In contrast, the proposed scoring scheme yields highly significant subnetworks, which contain biologically relevant proteins that cannot be identified by analysis of genome-wide association data alone. We also show that the proposed scoring scheme identifies subnetworks that are reproducible across different cohorts, and it can robustly recover relevant subnetworks at lower sampling rates.
Abstract. High-throughput molecular interaction data have been used effectively to prioritize candidate genes that are linked to a disease, based on the notion that the products of genes associated with similar diseases are likely to interact with each other heavily in a network of protein-protein interactions (PPIs). An important challenge for these applications, however, is the incomplete and noisy nature of PPI data. Random walk and network propagation based methods alleviate these problems to a certain extent, by considering indirect interactions and multiplicity of paths. However, as we demonstrate in this paper, such methods are likely to favor highly connected genes, making prioritization sensitive to the skewed degree distribution of PPI networks, as well as ascertainment bias in available interaction and disease association data. Here, we propose several statistical correction schemes that aim to account for the degree distribution of known disease and candidate genes. We show that, while the proposed schemes are very effective in detecting loosely connected disease genes that are missed by existing approaches, this improvement might come at the price of more false negatives for highly connected genes. Motivated by these results, we develop uniform prioritization methods that effectively integrate existing methods with the proposed statistical correction schemes. Comprehensive experimental results on the Online Mendelian Inheritance in Man (OMIM) database show that the resulting hybrid schemes outperform existing methods in prioritizing candidate disease genes.
Background: In systems biology, comparative analyses of molecular interactions across diverse species indicate that conservation and divergence of networks can be used to understand functional evolution from a systems perspective. A key characteristic of these networks is their modularity, which contributes significantly to their robustness, as well as adaptability. Consequently, analysis of modular network structures from a phylogenetic perspective may be useful in understanding the emergence, conservation, and diversification of functional modularity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.