The patterns of genomic divergence during ecological speciation are shaped by a combination of evolutionary forces. Processes such as genetic drift, local reduction of gene flow around genes causing reproductive isolation, hitchhiking around selected variants, variation in recombination and mutation rates are all factors that can contribute to the heterogeneity of genomic divergence. On the basis of 60 fully sequenced three-spined stickleback genomes, we explore these different mechanisms explaining the heterogeneity of genomic divergence across five parapatric lake and river population pairs varying in their degree of genetic differentiation. We find that divergent regions of the genome are mostly specific for each population pair, while their size and abundance are not correlated with the extent of genome-wide population differentiation. In each pair-wise comparison, an analysis of allele frequency spectra reveals that 25–55% of the divergent regions are consistent with a local restriction of gene flow. Another large proportion of divergent regions (38–75%) appears to be mainly shaped by hitchhiking effects around positively selected variants. We provide empirical evidence that alternative mechanisms determining the evolution of genomic patterns of divergence are not mutually exclusive, but rather act in concert to shape the genome during population differentiation, a first necessary step towards ecological speciation.
Metabarcoding has the potential to become a rapid, sensitive, and effective approach for identifying species in complex environmental samples. Accurate molecular identification of species depends on the ability to generate operational taxonomic units (OTUs) that correspond to biological species. Due to the sometimes enormous estimates of biodiversity using this method, there is a great need to test the efficacy of data analysis methods used to derive OTUs. Here, we evaluate the performance of various methods for clustering length variable 18S amplicons from complex samples into OTUs using a mock community and a natural community of zooplankton species. We compare analytic procedures consisting of a combination of (1) stringent and relaxed data filtering, (2) singleton sequences included and removed, (3) three commonly used clustering algorithms (mothur, UCLUST, and UPARSE), and (4) three methods of treating alignment gaps when calculating sequence divergence. Depending on the combination of methods used, the number of OTUs varied by nearly two orders of magnitude for the mock community (60–5068 OTUs) and three orders of magnitude for the natural community (22–22191 OTUs). The use of relaxed filtering and the inclusion of singletons greatly inflated OTU numbers without increasing the ability to recover species. Our results also suggest that the method used to treat gaps when calculating sequence divergence can have a great impact on the number of OTUs. Our findings are particularly relevant to studies that cover taxonomically diverse species and employ markers such as rRNA genes in which length variation is extensive.
Metabarcoding combines DNA barcoding with high‐throughput sequencing, often using one genetic marker to understand complex and taxonomically diverse samples. However, species‐level identification depends heavily on the choice of marker and the selected primer pair, often with a trade‐off between successful species amplification and taxonomic resolution. We present a versatile metabarcoding protocol for biomonitoring that involves the use of two barcode markers (COI and 18S) and four primer pairs in a single high‐throughput sequencing run, via sample multiplexing. We validate the protocol using a series of 24 mock zooplanktonic communities incorporating various levels of genetic variation. With the use of a single marker and single primer pair, the highest species recovery was 77%. With all three COI fragments, we detected 62%–83% of species across the mock communities, while the use of the 18S fragment alone resulted in the detection of 73%–75% of species. The species detection level was significantly improved to 89%–93% when both markers were used. Furthermore, multiplexing did not have a negative impact on the proportion of reads assigned to each species and the total number of species detected was similar to when markers were sequenced alone. Overall, our metabarcoding approach utilizing two barcode markers and multiple primer pairs per barcode improved species detection rates over a single marker/primer pair by 14% to 35%, making it an attractive and relatively cost‐effective method for biomonitoring natural zooplankton communities. We strongly recommend combining evolutionary independent markers and, when necessary, multiple primer pairs per marker to increase species detection (i.e., reduce false negatives) in metabarcoding studies.
DNA metabarcoding is a promising method for describing communities and estimating biodiversity. This approach uses high-throughput sequencing of targeted markers to identify species in a complex sample. By convention, sequences are clustered at a predefined sequence divergence threshold (often 3%) into operational taxonomic units (OTUs) that serve as a proxy for species. However, variable levels of interspecific marker variation across taxonomic groups make clustering sequences from a phylogenetically diverse dataset into OTUs at a uniform threshold problematic. In this study, we use mock zooplankton communities to evaluate the accuracy of species richness estimates when following conventional protocols to cluster hypervariable sequences of the V4 region of the small subunit ribosomal RNA gene (18S) into OTUs. By including individually tagged single specimens and “populations” of various species in our communities, we examine the impact of intra- and interspecific diversity on OTU clustering. Communities consisting of single individuals per species generated a correspondence of 59–84% between OTU number and species richness at a 3% divergence threshold. However, when multiple individuals per species were included, the correspondence between OTU number and species richness dropped to 31–63%. Our results suggest that intraspecific variation in this marker can often exceed 3%, such that a single species does not always correspond to one OTU. We advocate the need to apply group-specific divergence thresholds when analyzing complex and taxonomically diverse communities, but also encourage the development of additional filtering steps that allow identification of artifactual rRNA gene sequences or pseudogenes that may generate spurious OTUs.
Abstract:The combination of DNA barcoding and high-throughput (next-generation) sequencing (metabarcoding) provides many promises but also serious challenges. Generating a reliable comparable estimate of biodiversity remains a central challenge to the application of the technology. Many approaches have been used to turn millions of sequences into distinct taxonomic units. However, the extent to which these methods impact the outcome of simple ecological analyses is not well understood. Here we performed a simple analysis of dietary overlap by skinks and shrews on Ile Aux Aigrettes, Mauritius. We used a combination of filtering thresholds and clustering algorithms on a COI metabarcoding dataset and demonstrate that all bioinformatics parameters will have interacting effects on molecular operational taxonomic unit (MOTU) recovery rates. These effects generated estimates covering two orders of magnitude. However, the effect on a simple ecological analysis was not large and, despite the wide variation in estimates of niche overlap, the same ecological conclusion was drawn in most cases. We advise that a conservative clustering programme coupled with larger sequence divergences to define a cluster, the removal of singletons, rigorous length filtering, and stringent match criteria for Molecular Identifier tags are preferable to avoid MOTU inflation and that the same parameters be used in all comparative analyses.
Since the end of the Pleistocene, the three-spined stickleback (Gasterosteus aculeatus) has repeatedly colonized and adapted to various freshwater habitats probably originating from ancestral marine populations. Standing genetic variation and the underlying genomic architecture both have been speculated to contribute to recent adaptive radiations of sticklebacks. Here, we expand on the current genomic resources of this fish by providing extensive genome-wide variation data from six individuals from a marine (North Sea) stickleback population. Using next-generation sequencing and a combination of paired-end and mate-pair libraries, we detected a wide size range of genetic variation. Among the six individuals, we found more than 7% of the genome is polymorphic, consisting of 2599111 SNPs, 233464 indels and structural variation (SV) (>50 bp) such as 1054 copy-number variable regions (deletions and duplications) and 48 inversions. Many of these polymorphisms affect gene and coding sequences. Based on SNP diversity, we determined outlier regions concordant with signatures expected under adaptive evolution. As some of these outliers overlap with pronounced regions of copy-number variation, we propose the consideration of such SV when analysing SNP data from re-sequencing approaches. We further discuss the value of this resource on genome-wide variation for further investigation upon the relative contribution of standing variation on the parallel evolution of sticklebacks and the importance of the genomic architecture in adaptive radiation.
Duplicate genes emerge as copy-number variations (CNVs) at the population level, and remain copy-number polymorphic until they are fixed or lost. The successful establishment of such structural polymorphisms in the genome plays an important role in evolution by promoting genetic diversity, complexity and innovation. To characterize the early evolutionary stages of duplicate genes and their potential adaptive benefits, we combine comparative genomics with population genomics analyses to evaluate the distribution and impact of CNVs across natural populations of an eco-genomic model, the three-spined stickleback. With whole genome sequences of 66 individuals from populations inhabiting three distinct habitats, we find that CNVs generally occur at low frequencies and are often only found in one of the 11 populations surveyed. A subset of CNVs, however, displays copy-number differentiation between populations, showing elevated within-population frequencies consistent with local adaptation. By comparing teleost genomes to identify lineage-specific genes and duplications in sticklebacks, we highlight rampant gene content differences among individuals in which over 30% of young duplicate genes are CNVs. These CNV genes are evolving rapidly at the molecular level and are enriched with functional categories associated with environmental interactions, depicting the dynamic early copy-number polymorphic stage of genes during population differentiation.
Understanding the rates, spectra, and fitness effects of spontaneous mutations is fundamental to answering key questions in evolution, molecular biology, disease genetics, and conservation biology. To estimate mutation rates and evaluate the effect of selection on new mutations, we propagated mutation accumulation (MA) lines of Daphnia pulex for more than 82 generations and maintained a non-MA population under conditions where selection could act. Both experiments were started with the same obligate asexual progenitor clone. By sequencing 30 genomes and implementing a series of validation steps that informed the bioinformatic analyses, we identified a total of 477 single nucleotide mutations (SNMs) in the MA lines, corresponding to a mutation rate of 2.30 × 10 (95% CI 1.90-2.70 × 10) per nucleotide per generation. The high overall rate of loss of heterozygosity (LOH) of 4.82 × 10 per site per generation was due to a large ameiotic recombination event spanning an entire arm of a chromosome (∼6 Mb) and several hemizygous deletion events spanning ∼2 kb each. In the non-MA population, we found significantly fewer mutations than expected based on the rate derived from the MA experiment, indicating purifying selection was likely acting to remove new deleterious mutations. We observed a surprisingly high level of genetic variability in the non-MA population, which we propose to be driven by balancing selection. Our findings suggest that both positive and negative selection on new mutations is powerful and effective in a strictly clonal population.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.