Polyploid speciation has played an important role in evolutionary history across the tree of life, yet there remain large gaps in our understanding of how polyploid species form and persist. While systematic studies have been conducted in numerous polyploid complexes, recent advances in sequencing technology have demonstrated that conclusions from data-limited studies may be spurious and misleading. The North American gray treefrog complex, consisting of the diploid Hyla chrysoscelis and the tetraploid Hyla versicolor, has long been used as a model system in a variety of biological fields, yet all taxonomic studies to date were conducted with only a few loci from nuclear and mitochondrial genomes. Here, we utilized anchored hybrid enrichment and high-throughput sequencing to capture hundreds of loci along with whole mitochondrial genomes to investigate the evolutionary history of this complex. We used several phylogenetic and population genetic methods, including coalescent simulations and testing of polyploid speciation models with Approximate Bayesian Computation (ABC), to determine that H. versicolor was most likely formed via autopolyploidization from a now extinct lineage of H. chrysoscelis. We also uncovered evidence of significant hybridization between diploids and tetraploids where they co-occur, and show that historical hybridization between these groups led to the re-formation of distinct polyploid lineages following the initial whole genome duplication event. Our study indicates that a wide variety of methods and explicit model testing of polyploid histories can greatly facilitate efforts to uncover the evolutionary history of polyploid complexes.
Polyploidy has played an important role in evolution across the tree of life but it is still unclear how polyploid lineages may persist after their initial formation. While both common and wellstudied in plants, polyploidy is rare in animals and generally less understood. The Australian burrowing frog genus Neobatrachus is comprised of six diploid and three polyploid species and offers a powerful animal polyploid model system. We generated exome-capture sequence data from 87 individuals representing all nine species of Neobatrachus to investigate species-level relationships, the origin and inheritance mode of polyploid species, and the population genomic effects of polyploidy on genus-wide demography. We describe rapid speciation of diploid Neobatrachus species and show that the three independently originated polyploid species have tetrasomic or mixed inheritance. We document higher genetic diversity in tetraploids, resulting from widespread gene flow between the tetraploids, asymmetric inter-ploidy gene flow directed from sympatric diploids to tetraploids, and isolation of diploid species from each other. We also constructed models of ecologically suitable areas for each species to investigate the impact of climate on differing ploidy levels. These models suggest substantial change in suitable areas compared to past climate, which correspond to population genomic estimates of demographic histories. We propose that Neobatrachus diploids may be suffering the early genomic impacts of climate-induced habitat loss, while tetraploids appear to be avoiding this fate, possibly due to widespread gene flow. Finally, we demonstrate that Neobatrachus is an attractive model to study the effects of ploidy on the evolution of adaptation in animals.
Numerous studies over the last decade have demonstrated the utility of machine learning methods when applied to population genetic tasks. More recent studies show the potential of deep learning methods in particular, which allow researchers to approach problems without making prior assumptions about how the data should be summarized or manipulated, instead learning their own internal representation of the data in an attempt to maximize inferential accuracy. One type of deep neural network, called Generative Adversarial Networks (GANs), can even be used to generate new data, and this approach has been used to create individual artificial human genomes free from privacy concerns. In this study, we further explore the application of GANs in population genetics by designing and training a network to learn the statistical distribution of population genetic alignments (i.e. data sets consisting of sequences from an entire population sample) under several diverse evolutionary histories—the first GAN capable of performing this task. After testing multiple different neural network architectures, we report the results of a fully differentiable Deep-Convolutional Wasserstein GAN with gradient penalty that is capable of generating artificial examples of population genetic alignments that successfully mimic key aspects of the training data, including the site frequency spectrum, differentiation between populations, and patterns of linkage disequilibrium. We demonstrate consistent training success across various evolutionary models, including models of panmictic and subdivided populations, populations at equilibrium and experiencing changes in size, and populations experiencing either no selection or positive selection of various strengths, all without the need for extensive hyperparameter tuning. Overall, our findings highlight the ability of GANs to learn and mimic population genetic data, and suggest future areas where this work can be applied in population genetics research that we discuss herein.AUTHOR SUMMARYThe application of deep-learning to biological problems has expanded greatly over the last decade. One type of deep neural network, called a Generative Adversarial Network (GAN), attempts to generate artificial examples of a given type of data by learning to fool a discriminator that is simultaneously learning to discriminate between real and artificial examples. In this study, we design a GAN whose purpose is to generate artificial examples of genetic alignments from biological populations of varying evolutionary histories—essentially learning the statistical distribution of those evolutionary histories. We show that our GAN is able to mimic key aspects of the genetic alignments relevant to population genetics, and that the GAN does not require extensive tuning of the network parameters. Ultimately, this work demonstrates the ability of these networks to learn and mimic population genetic data, and highlights future areas where this work can be applied and expanded.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.