BackgroundThe genetic bases of many complex phenotypes are still largely unknown, mostly due to the polygenic nature of the traits and the small effect of each associated mutation. An alternative approach to classic association studies to determining such genetic bases is an evolutionary framework. As sites targeted by natural selection are likely to harbor important functionalities for the carrier, the identification of selection signatures in the genome has the potential to unveil the genetic mechanisms underpinning human phenotypes. Popular methods of detecting such signals rely on compressing genomic information into summary statistics, resulting in the loss of information. Furthermore, few methods are able to quantify the strength of selection. Here we explored the use of deep learning in evolutionary biology and implemented a program, called ImaGene, to apply convolutional neural networks on population genomic data for the detection and quantification of natural selection.ResultsImaGene enables genomic information from multiple individuals to be represented as abstract images. Each image is created by stacking aligned genomic data and encoding distinct alleles into separate colors. To detect and quantify signatures of positive selection, ImaGene implements a convolutional neural network which is trained using simulations. We show how the method implemented in ImaGene can be affected by data manipulation and learning strategies. In particular, we show how sorting images by row and column leads to accurate predictions. We also demonstrate how the misspecification of the correct demographic model for producing training data can influence the quantification of positive selection. We finally illustrate an approach to estimate the selection coefficient, a continuous variable, using multiclass classification techniques.ConclusionsWhile the use of deep learning in evolutionary genomics is in its infancy, here we demonstrated its potential to detect informative patterns from large-scale genomic data. We implemented methods to process genomic data for deep learning in a user-friendly program called ImaGene. The joint inference of the evolutionary history of mutations and their functional impact will facilitate mapping studies and provide novel insights into the molecular mechanisms associated with human phenotypes.Electronic supplementary materialThe online version of this article (10.1186/s12859-019-2927-x) contains supplementary material, which is available to authorized users.
Our understanding of population history in deep time has been assisted by fitting admixture graphs ('AGs') to data: models that specify the ordering of population splits and mixtures, which along with the amount of genetic drift on each lineage and the proportions of mixture, is the only information needed to predict the patterns of allele frequency correlation among populations. Not needing to specify population size changes, split times, or whether admixture events were sudden or drawn out simplifies the space of models that need to be searched. However, the space of possible AGs relating populations is vast and cannot be sampled fully, and thus most published studies have identified fitting AGs through a manual process driven by prior hypotheses, leaving the vast majority of alternative models unexplored. Here, we develop a method for systematically searching the space of all AGs that can incorporate non-genetic information in the form of topology constraints. We implement this findGraphs tool within a software package, ADMIXTOOLS 2, which is a reimplementation of the ADMIXTOOLS software with new features and large performance gains. We apply this methodology to identify alternative models to AGs that played key roles in eight published studies and find that graphs modeling more than six populations and two or three admixture events are often not unique, with many alternative models fitting nominally or significantly better than the published one. Our results suggest that strong claims about population history from AGs should only be made when all well-fitting and temporally plausible models share common topological features. Our re-evaluation of published data also provides insight into the population histories of humans, dogs, and horses, identifying features that are stable across the models we explored, as well as scenarios of populations relationships that differ in important ways from models that have been highlighted in the literature, that fit the allele frequency correlation data, and that are not obviously wrong.
Balancing selection is a selective process that generates and maintains genetic diversity within populations, as firstly proposed by (Dobzhansky, (1951)). Many diverse mechanisms of balancing selection have been described (Charlesworth, 2006). Overdominance (or heterozygote advantage) occurs when heterozygote individuals at one locus have higher fitness than homozygotes. In sexually antagonistic selection, different alleles at the same locus have opposite effects in the two sexes creating a balanced polymorphism
cells in largely non-mitotic tissues such as the brain are prone to stochastic (epi-)genetic alterations that may cause increased variability between cells and individuals over time. Although increased interindividual heterogeneity in gene expression was previously reported, whether this process starts during development or if it is restricted to the aging period has not yet been studied. the regulatory dynamics and functional significance of putative aging-related heterogeneity are also unknown. Here we address these by a meta-analysis of 19 transcriptome datasets from three independent studies, covering diverse human brain regions. We observed a significant increase in inter-individual heterogeneity during aging (20 + years) compared to postnatal development (0 to 20 years). Increased heterogeneity during aging was consistent among different brain regions at the gene level and associated with lifespan regulation and neuronal functions. overall, our results show that increased expression heterogeneity is a characteristic of aging human brain, and may influence aging-related changes in brain functions. Aging is a complex process characterized by a gradual decline in maintenance and repair mechanisms, accompanied by an increase in genetic and epigenetic mutations, and oxidative damage to nucleic acids, protein and lipids 1,2. The human brain experiences dramatic structural and functional changes in the course of aging. These include decline in gray matter and white matter volumes 3 , increase in axonal bouton dynamics 4 and reduced synaptic plasticity, all processes that may be associated with decline in cognitive functions 5. Changes during brain aging are suggested to be a result of stochastic processes, unlike changes associated with postnatal neuronal development that are known to be primarily controlled by adaptive regulatory processes 6-8. The molecular mechanisms underlying age-related alteration of regulatory processes and eventually leading to aging-related phenotypes, however, are little understood. Over the past decade, a number of transcriptome studies focusing on age-related changes in human brain gene expression profiles were published 2,9-12. These studies report aging-related differential expression patterns in many functions, including synaptic functions, energy metabolism, inflammation, stress response, and DNA repair. By analyzing age-related change in gene expression profiles in diverse brain regions, we previously showed that for many genes, gene expression changes occur in opposite directions during postnatal development (pre-20 years of age) and aging (post-20 years of age), which may be associated with aging-related phenotypes in healthy brain aging 13. While different brain regions are associated with specific, and often independent, gene expression profiles 9,10,12 , these studies also show that age-related alteration of gene expression profiles during aging is a widespread effect across different brain regions. One of the suggested effects of aging is increased variability between indivi...
Developmental trajectories of gene expression may reverse in their direction during ageing, a phenomenon previously linked to cellular identity loss. Our analysis of cerebral cortex, lung, liver and muscle transcriptomes of 16 mice, covering development and ageing intervals, revealed widespread but tissue-specific ageing-associated expression reversals. Cumulatively, these reversals create a unique phenomenon: mammalian tissue transcriptomes diverge from each other during postnatal development, but during ageing, they tend to converge towards similar expression levels, a process we term Divergence followed by Convergence, or DiCo. We found that DiCo was most prevalent among tissue-specific genes and associated with loss of tissue identity, which is confirmed using data from independent mouse and human datasets. Further, using publicly available single-cell transcriptome data, we showed that DiCo could be driven both by alterations in tissue cell type composition and also by cell-autonomous expression changes within particular cell types.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.