The emergence and evolutionary expansion of gene families implicated in cancers and other severe genetic diseases is an evolutionary oddity from a natural selection perspective. Here, we show that gene families prone to deleterious mutations in the human genome have been preferentially expanded by the retention of "ohnolog" genes from two rounds of whole-genome duplication (WGD) dating back from the onset of jawed vertebrates. We further demonstrate that the retention of many ohnologs suspected to be dosage balanced is in fact indirectly mediated by their susceptibility to deleterious mutations. This enhanced retention of "dangerous" ohnologs, defined as prone to autosomal-dominant deleterious mutations, is shown to be a consequence of WGD-induced speciation and the ensuing purifying selection in post-WGD species. These findings highlight the importance of WGD-induced nonadaptive selection for the emergence of vertebrate complexity, while rationalizing, from an evolutionary perspective, the expansion of gene families frequently implicated in genetic disorders and cancers.
Learning causal networks from large-scale genomic data remains challenging in absence of time series or controlled perturbation experiments. We report an information- theoretic method which learns a large class of causal or non-causal graphical models from purely observational data, while including the effects of unobserved latent variables, commonly found in many genomic datasets. Starting from a complete graph, the method iteratively removes dispensable edges, by uncovering significant information contributions from indirect paths, and assesses edge-specific confidences from randomization of available data. The remaining edges are then oriented based on the signature of causality in observational data. The approach and associated algorithm, , outperform earlier methods on a broad range of benchmark networks. Causal network reconstructions are presented at different biological size and time scales, from gene regulation in single cells to whole genome duplication in tumor development as well as long term evolution of vertebrates. is publicly available at https://github.com/miicTeam/MIIC.
Background: The mechanisms responsible for calorie restriction (CR)-induced improvement in insulin sensitivity (IS) have not been fully elucidated. Greater insight can be achieved through deep biological phenotyping of subjects undergoing CR, and integration of big data.Materials and Methods: An integrative approach was applied to investigate associations between change in IS and factors from host, microbiota, and lifestyle after a 6-week CR period in 27 overweight or obese adults (ClinicalTrials.gov: NCT01314690). Partial least squares regression was used to determine associations of change (week 6 – baseline) between IS markers and lifestyle factors (diet and physical activity), subcutaneous adipose tissue (sAT) gene expression, metabolomics of serum, urine and feces, and gut microbiota composition. ScaleNet, a network learning approach based on spectral consensus strategy (SCS, developed by us) was used for reconstruction of biological networks.Results: A spectrum of variables from lifestyle factors (10 nutrients), gut microbiota (10 metagenomics species), and host multi-omics (metabolic features: 84 from serum, 73 from urine, and 131 from feces; and 257 sAT gene probes) most associated with IS were identified. Biological network reconstruction using SCS, highlighted links between changes in IS, serum branched chain amino acids, sAT genes involved in endoplasmic reticulum stress and ubiquitination, and gut metagenomic species (MGS). Linear regression analysis to model how changes of select variables over the CR period contribute to changes in IS, showed greatest contributions from gut MGS and fiber intake.Conclusion: This work has enhanced previous knowledge on links between host glucose homeostasis, lifestyle factors and the gut microbiota, and has identified potential biomarkers that may be used in future studies to predict and improve individual response to weight-loss interventions. Furthermore, this is the first study showing integration of the wide range of data presented herein, identifying 115 variables of interest with respect to IS from the initial input, consisting of 9,986 variables.Clinical Trial Registration: clinicaltrials.gov (NCT01314690).
BackgroundThe reconstruction of reliable graphical models from observational data is important in bioinformatics and other computational fields applying network reconstruction methods to large, yet finite datasets. The main network reconstruction approaches are either based on Bayesian scores, which enable the ranking of alternative Bayesian networks, or rely on the identification of structural independencies, which correspond to missing edges in the underlying network. Bayesian inference methods typically require heuristic search strategies, such as hill-climbing algorithms, to sample the super-exponential space of possible networks. By contrast, constraint-based methods, such as the PC and IC algorithms, are expected to run in polynomial time on sparse underlying graphs, provided that a correct list of conditional independencies is available. Yet, in practice, conditional independencies need to be ascertained from the available observational data, based on adjustable statistical significance levels, and are not robust to sampling noise from finite datasets.ResultsWe propose a more robust approach to reconstruct graphical models from finite datasets. It combines constraint-based and Bayesian approaches to infer structural independencies based on the ranking of their most likely contributing nodes. In a nutshell, this local optimization scheme and corresponding 3off2 algorithm iteratively “take off” the most likely conditional 3-point information from the 2-point (mutual) information between each pair of nodes. Conditional independencies are thus derived by progressively collecting the most significant indirect contributions to all pairwise mutual information. The resulting network skeleton is then partially directed by orienting and propagating edge directions, based on the sign and magnitude of the conditional 3-point information of unshielded triples. The approach is shown to outperform both constraint-based and Bayesian inference methods on a range of benchmark networks. The 3off2 approach is then applied to the reconstruction of the hematopoiesis regulation network based on recent single cell expression data and is found to retrieve more experimentally ascertained regulations between transcription factors than with other available methods.ConclusionsThe novel information-theoretic approach and corresponding 3off2 algorithm combine constraint-based and Bayesian inference methods to reliably reconstruct graphical models, despite inherent sampling noise in finite datasets. In particular, experimentally verified interactions as well as novel predicted regulations are established on the hematopoiesis regulatory networks based on single cell expression data.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-015-0856-x) contains supplementary material, which is available to authorized users.
Supplementary data are available at Bioinformatics online.
BackgroundThe last decades witnessed an explosion of large-scale biological datasets whose analyses require the continuous development of innovative algorithms. Many of these high-dimensional datasets are related to large biological networks with few or no experimentally proven interactions. A striking example lies in the recent gut bacterial studies that provided researchers with a plethora of information sources. Despite a deeper knowledge of microbiome composition, inferring bacterial interactions remains a critical step that encounters significant issues, due in particular to high-dimensional settings, unknown gut bacterial taxa and unavoidable noise in sparse datasets. Such data type make any a priori choice of a learning method particularly difficult and urge the need for the development of new scalable approaches.ResultsWe propose a consensus method based on spectral decomposition, named Spectral Consensus Strategy, to reconstruct large networks from high-dimensional datasets. This novel unsupervised approach can be applied to a broad range of biological networks and the associated spectral framework provides scalability to diverse reconstruction methods. The results obtained on benchmark datasets demonstrate the interest of our approach for high-dimensional cases. As a suitable example, we considered the human gut microbiome co-presence network. For this application, our method successfully retrieves biologically relevant relationships and gives new insights into the topology of this complex ecosystem.ConclusionsThe Spectral Consensus Strategy improves prediction precision and allows scalability of various reconstruction methods to large networks. The integration of multiple reconstruction algorithms turns our approach into a robust learning method. All together, this strategy increases the confidence of predicted interactions from high-dimensional datasets without demanding computations.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1308-y) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.