Massively parallel short-read sequencing technologies, coupled with powerful software platforms, are enabling investigators to analyse tens of thousands of genetic markers. This wealth of data is rapidly expanding and allowing biological questions to be addressed with unprecedented scope and precision. The sizes of the data sets are now posing significant data processing and analysis challenges. Here we describe an extension of the Stacks software package to efficiently use genotype-by-sequencing data for studies of populations of organisms. Stacks now produces core population genomic summary statistics and SNP-by-SNP statistical tests. These statistics can be analysed across a reference genome using a smoothed sliding window. Stacks also now provides several output formats for several commonly used downstream analysis packages. The expanded population genomics functions in Stacks will make it a useful tool to harness the newest generation of massively parallel genotyping data for ecological and evolutionary genetics.
Single nucleotide polymorphism (SNP) discovery and genotyping are essential to genetic mapping. There remains a need for a simple, inexpensive platform that allows high-density SNP discovery and genotyping in large populations. Here we describe the sequencing of restriction-site associated DNA (RAD) tags, which identified more than 13,000 SNPs, and mapped three traits in two model organisms, using less than half the capacity of one Illumina sequencing run. We demonstrated that different marker densities can be attained by choice of restriction enzyme. Furthermore, we developed a barcoding system for sample multiplexing and fine mapped the genetic basis of lateral plate armor loss in threespine stickleback by identifying recombinant breakpoints in F2 individuals. Barcoding also facilitated mapping of a second trait, a reduction of pelvic structure, by in silico re-sorting of individuals. To further demonstrate the ease of the RAD sequencing approach we identified polymorphic markers and mapped an induced mutation in Neurospora crassa. Sequencing of RAD markers is an integrated platform for SNP discovery and genotyping. This approach should be widely applicable to genetic mapping in a variety of organisms.
Advances in sequencing technology provide special opportunities for genotyping individuals with speed and thrift, but the lack of software to automate the calling of tens of thousands of genotypes over hundreds of individuals has hindered progress. Stacks is a software system that uses short-read sequence data to identify and genotype loci in a set of individuals either de novo or by comparison to a reference genome. From reduced representation Illumina sequence data, such as RAD-tags, Stacks can recover thousands of single nucleotide polymorphism (SNP) markers useful for the genetic analysis of crosses or populations. Stacks can generate markers for ultra-dense genetic linkage maps, facilitate the examination of population phylogeography, and help in reference genome assembly. We report here the algorithms implemented in Stacks and demonstrate their efficacy by constructing loci from simulated RAD-tags taken from the stickleback reference genome and by recapitulating and improving a genetic map of the zebrafish, Danio rerio.
Next-generation sequencing technology provides novel opportunities for gathering genome-scale sequence data in natural populations, laying the empirical foundation for the evolving field of population genomics. Here we conducted a genome scan of nucleotide diversity and differentiation in natural populations of threespine stickleback (Gasterosteus aculeatus). We used Illumina-sequenced RAD tags to identify and type over 45,000 single nucleotide polymorphisms (SNPs) in each of 100 individuals from two oceanic and three freshwater populations. Overall estimates of genetic diversity and differentiation among populations confirm the biogeographic hypothesis that large panmictic oceanic populations have repeatedly given rise to phenotypically divergent freshwater populations. Genomic regions exhibiting signatures of both balancing and divergent selection were remarkably consistent across multiple, independently derived populations, indicating that replicate parallel phenotypic evolution in stickleback may be occurring through extensive, parallel genetic evolution at a genome-wide scale. Some of these genomic regions co-localize with previously identified QTL for stickleback phenotypic variation identified using laboratory mapping crosses. In addition, we have identified several novel regions showing parallel differentiation across independent populations. Annotation of these regions revealed numerous genes that are candidates for stickleback phenotypic evolution and will form the basis of future genetic analyses in this and other organisms. This study represents the first high-density SNP–based genome scan of genetic diversity and differentiation for populations of threespine stickleback in the wild. These data illustrate the complementary nature of laboratory crosses and population genomic scans by confirming the adaptive significance of previously identified genomic regions, elucidating the particular evolutionary and demographic history of such regions in natural populations, and identifying new genomic regions and candidate genes of evolutionary significance.
Restriction site associated DNA (RAD) tags are a genome-wide representation of every site of a particular restriction enzyme by short DNA tags. Most organisms segregate large numbers of DNA sequence polymorphisms that disrupt restriction sites, which allows RAD tags to serve as genetic markers spread at a high density throughout the genome. Here, we demonstrate the applicability of RAD markers for both individual and bulk-segregant genotyping. First, we show that these markers can be identified and typed on pre-existing microarray formats. Second, we present a method that uses RAD marker DNA to rapidly produce a low-cost microarray genotyping resource that can be used to efficiently identify and type thousands of RAD markers. We demonstrate the utility of the former approach by using a tiling path array for the fruit fly to map a recombination breakpoint, and the latter approach by creating and using an enriched RAD marker array for the threespine stickleback. The high number of RAD markers enabled localization of a previously identified region, as well as a second region also associated with the lateral plate phenotype. Taken together, our results demonstrate that RAD markers, and the method to develop a RAD marker microarray resource, allow high-throughput, high-resolution genotyping in both model and nonmodel systems.
Most adaptation is thought to occur through the fixation of numerous alleles at many different loci. Consequently, the independent evolution of similar phenotypes is predicted to occur through different genetic mechanisms. The genetic basis of adaptation is still largely unknown, however, and it is unclear whether adaptation to new environments utilizes ubiquitous small-effect polygenic variation or large-effect alleles at a small number of loci. To address this question, we examined the genetic basis of bony armor loss in three freshwater populations of Alaskan threespine stickleback, Gasterosteus aculeatus, that evolved from fully armored anadromous populations in the last 14,000 years. Crosses between complete-armor and low-armor populations revealed that a single Mendelian factor governed the formation of all but the most anterior lateral plates, and another independently segregating factor largely determined pelvic armor. Genetic mapping localized the Mendelian genes to different chromosomal regions, and crosses among these same three widely separated populations showed that both bony plates and pelvic armor failed to fully complement, implicating the same Mendelian armor reduction genes. Thus, rapid and repeated armor loss in Alaskan stickleback populations appears to be occurring through the fixation of largeeffect variants in the same genes.A central tenet of evolutionary theory is that adaptation in the wild, like artificial selection, occurs gradually through the sequential fixation of small-effect variants (1). Consequently, the independent evolution of similar phenotypes is expected to use unique combinations of genes and alleles (2). New populations, however, are often established in novel environments at the edge of an organism's range, and selective pressures faced in these new habitats are often an important causative factor for adaptive radiations (3). Importantly, novel environments may also have immediate disruptive effects on developmental processes that can expose novel genetic variants, some of which may have large effects on evolving phenotypes (4, 5). The importance of genes of major effect is currently the focus of renewed research (6, 7). The role of major effect genes during adaptation, however, is still unclear, as is the frequency with which recurrent phenotypic evolution occurs through changes in the same (8-11) or different (8,12,13) genes. In addition, the genetics of adaptation has most often been studied in the laboratory (14), with much less work in natural populations (13). To address these problems, we have taken advantage of a unique natural system, the rapid postglacial diversification of threespine stickleback, Gasterosteus aculeatus (15). Thousands of coastal freshwater populations of stickleback have derived independently from anadromous (sea-run) ancestors. Phenotypically similar throughout their range, anadromous stickleback are protected with bony armor including lateral plates and a robust set of dorsal and pelvic spines (Fig. 1D). In contrast, derived lacustrine pop...
The distinction between model and nonmodel organisms is becoming increasingly blurred. High-throughput, second-generation sequencing approaches are being applied to organisms based on their interesting ecological, physiological, developmental, or evolutionary properties and not on the depth of genetic information available for them. Here, we illustrate this point using a low-cost, efficient technique to determine the fine-scale phylogenetic relationships among recently diverged populations in a species. This application of restriction site-associated DNA tags (RAD tags) reveals previously unresolved genetic structure and direction of evolution in the pitcher plant mosquito, Wyeomyia smithii, from a southern Appalachian Mountain refugium following recession of the Laurentide Ice Sheet at 22,000-19,000 B.P. The RAD tag method can be used to identify detailed patterns of phylogeography in any organism regardless of existing genomic data, and, more broadly, to identify incipient speciation and genome-wide variation in natural populations in general.genomics | restriction site-associated DNA tag | second-generation sequencing | Wyeomyia smithii
Next-generation sequencing technologies are revolutionizing the field of evolutionary biology, opening the possibility for genetic analysis at scales not previously possible. Research in population genetics, quantitative trait mapping, comparative genomics, and phylogeography that was unthinkable even a few years ago is now possible. More importantly, these next-generation sequencing studies can be performed in organisms for which few genomic resources presently exist. To speed this revolution in evolutionary genetics, we have developed Restriction site Associated DNA (RAD) genotyping, a method that uses Illumina next-generation sequencing to simultaneously discover and score tens to hundreds of thousands of single-nucleotide polymorphism (SNP) markers in hundreds of individuals for minimal investment of resources. In this chapter, we describe the core RAD-seq protocol, which can be modified to suit a diversity of evolutionary genetic questions. In addition, we discuss bioinformatic considerations that arise from unique aspects of next-generation sequencing data as compared to traditional marker-based approaches, and we outline some general analytical approaches for RAD-seq and similar data. Despite considerable progress, the development of analytical tools remains in its infancy, and further work is needed to fully quantify sampling variance and biases in these data types.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.