BackgroundRecently, several studies have drawn attention to the determination of a minimum set of driver proteins that are important for the control of the underlying protein-protein interaction (PPI) networks. In general, the minimum dominating set (MDS) model is widely adopted. However, because the MDS model does not generate a unique MDS configuration, multiple different MDSs would be generated when using different optimization algorithms. Therefore, among these MDSs, it is difficult to find out the one that represents the true driver set of proteins.ResultsTo address this problem, we develop a centrality-corrected minimum dominating set (CC-MDS) model which includes heterogeneity in degree and betweenness centralities of proteins. Both the MDS model and the CC-MDS model are applied on three human PPI networks. Unlike the MDS model, the CC-MDS model generates almost the same sets of driver proteins when we implement it using different optimization algorithms. The CC-MDS model targets more high-degree and high-betweenness proteins than the uncorrected counterpart. The more central position allows CC-MDS proteins to be more important in maintaining the overall network connectivity than MDS proteins. To indicate the functional significance, we find that CC-MDS proteins are involved in, on average, more protein complexes and GO annotations than MDS proteins. We also find that more essential genes, aging genes, disease-associated genes and virus-targeted genes appear in CC-MDS proteins than in MDS proteins. As for the involvement in regulatory functions, the sets of CC-MDS proteins show much stronger enrichment of transcription factors and protein kinases. The results about topological and functional significance demonstrate that the CC-MDS model can capture more driver proteins than the MDS model.ConclusionsBased on the results obtained, the CC-MDS model presents to be a powerful tool for the determination of driver proteins that can control the underlying PPI networks. The software described in this paper and the datasets used are available at https://github.com/Zhangxf-ccnu/CC-MDS.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-015-0591-3) contains supplementary material, which is available to authorized users.
Cancer prognosis is of essential interest, and extensive research has been conducted searching for biomarkers with prognostic power. Recent studies have shown that both omics profiles and histopathological imaging features have prognostic power. There are also studies exploring integrating the two types of measurements for prognosis modeling. However, there is a lack of study rigorously examining whether omics measurements have independent prognostic power conditional on histopathological imaging features, and vice versa. In this article, we adopt a rigorous statistical testing framework and test whether an individual gene expression measurement can improve prognosis modeling conditional on high-dimensional imaging features, and a parallel analysis is conducted reversing the roles of gene expressions and imaging features. In the analysis of The Cancer Genome Atlas (TCGA) lung adenocarcinoma and liver hepatocellular carcinoma data, it is found that multiple individual genes, conditional on imaging features, can lead to significant improvement in prognosis modeling; however, individual imaging features, conditional on gene expressions, only offer limited prognostic power. Being among the first to examine the independent prognostic power, this study may assist better understanding the “connectedness” between omics profiles and histopathological imaging features and provide important insights for data integration in cancer modeling.
Revealing functional units in protein-protein interaction (PPI) networks are important for understanding cellular functional organization. Current algorithms for identifying functional units mainly focus on cohesive protein complexes which have more internal interactions than external interactions. Most of these approaches do not handle overlaps among complexes since they usually allow a protein to belong to only one complex. Moreover, recent studies have shown that other non-cohesive structural functional units beyond complexes also exist in PPI networks. Thus previous algorithms that just focus on non-overlapping cohesive complexes are not able to present the biological reality fully. Here, we develop a new regularized sparse random graph model (RSRGM) to explore overlapping and various structural functional units in PPI networks. RSRGM is principally dominated by two model parameters. One is used to define the functional units as groups of proteins that have similar patterns of connections to others, which allows RSRGM to detect non-cohesive structural functional units. The other one is used to represent the degree of proteins belonging to the units, which supports a protein belonging to more than one revealed unit. We also propose a regularizer to control the smoothness between the estimators of these two parameters. Experimental results on four S. cerevisiae PPI networks show that the performance of RSRGM on detecting cohesive complexes and overlapping complexes is superior to that of previous competing algorithms. Moreover, RSRGM has the ability to discover biological significant functional units besides complexes.
For the etiology, progression, and treatment of complex diseases, gene‐environment (G‐E) interactions have important implications beyond the main G and E effects. G‐E interaction analysis can be more challenging with higher dimensionality and need for accommodating the “main effects, interactions” hierarchy. In recent literature, an array of novel methods, many of which are based on the penalization technique, have been developed. In most of these studies, however, the structures of G measurements, for example, the adjacency structure of single nucleotide polymorphisms (SNPs; attributable to their physical adjacency on the chromosomes) and the network structure of gene expressions (attributable to their coordinated biological functions and correlated measurements) have not been well accommodated. In this study, we develop structured G‐E interaction analysis, where such structures are accommodated using penalization for both the main G effects and interactions. Penalization is also applied for regularized estimation and selection. The proposed structured interaction analysis can be effectively realized. It is shown to have consistency properties under high‐dimensional settings. Simulations and analysis of GENEVA diabetes data with SNP measurements and TCGA melanoma data with gene expression measurements demonstrate its competitive practical performance.
Gene-environment (G-E) interactions have important implications for the etiology and progression of many complex diseases. Compared to continuous markers and categorical disease status, prognosis has been less investigated, with the additional challenges brought by the unique characteristics of survival outcomes. Most of the existing G-E interaction approaches for prognosis data share the limitation that they cannot accommodate long-tailed or contaminated outcomes. In this study, for prognosis data, we develop a robust G-E interaction identification approach using the censored quantile partial correlation (CQPCorr) technique. The proposed approach is built on the quantile regression technique (and hence has a solid statistical basis), uses weights to easily accommodate censoring, and adopts partial correlation to identify important interactions while properly controlling for the main genetic and environmental effects. In simulation, it outperforms multiple competitors with more accurate identification. In the analysis of TCGA data on lung cancer and melanoma, biologically sensible findings different from using the alternatives are made.
BackgroundIn biomedical research, gene expression profiling studies have been extensively conducted. The analysis of gene expression data has led to a deeper understanding of human genetics as well as practically useful models. Clustering analysis has been a critical component of gene expression data analysis and can reveal the (previously unknown) interconnections among genes. With the high dimensionality of gene expression data, many of the existing clustering methods and results are not as satisfactory. Intuitively, this is caused by “a lack of information”. In recent profiling studies, a prominent trend is to collect data on gene expressions as well as their regulators (copy number alteration, microRNA, methylation, etc.) on the same subjects, making it possible to borrow information from other types of omics measurements in gene expression analysis.MethodsIn this study, an ANCut approach is developed, which is built on the regularized estimation and NCut techniques. An effective R code that implements this approach is developed.ResultsSimulation shows that the proposed approach outperforms direct competitors. The analysis of TCGA (The Cancer Genome Atlas) data further demonstrates its satisfactory performance.ConclusionsWe propose a more effective clustering analysis of gene expression data, with the assistance of information from regulators. It provides a new venue for analyzing gene expression data based on the assisted analysis strategy.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-017-3990-1) contains supplementary material, which is available to authorized users.
Gene-gene (G×G) interactions have been shown to be critical for the fundamental mechanisms and development of complex diseases beyond main genetic effects. The commonly adopted marginal analysis is limited by considering only a small number of G factors at a time. With the "main effects, interactions" hierarchical constraint, many of the existing joint analysis methods suffer from prohibitively high computational cost. In this study, we propose a new method for identifying important G×G interactions under joint modeling. The proposed method adopts tensor regression to accommodate high data dimensionality and the penalization technique for selection. It naturally accommodates the strong hierarchical structure without imposing additional constraints, making optimization much simpler and faster than in the existing studies. It outperforms multiple alternatives in simulation. The analysis of The Cancer Genome Atlas (TCGA) data on lung cancer and melanoma demonstrates that it can identify markers with important implications and better prediction performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.