Identifying loci under natural selection from genomic surveys is of great interest in different research areas. Commonly used methods to separate neutral effects from adaptive effects are based on locusspecific population differentiation coefficients to identify outliers. Here we extend such an approach to estimate directly the probability that each locus is subject to selection using a Bayesian method. We also extend it to allow the use of dominant markers like AFLPs. It has been shown that this model is robust to complex demographic scenarios for neutral genetic differentiation. Here we show that the inclusion of isolated populations that underwent a strong bottleneck can lead to a high rate of false positives. Nevertheless, we demonstrate that it is possible to avoid them by carefully choosing the populations that should be included in the analysis. We analyze two previously published data sets: a human data set of codominant markers and a Littorina saxatilis data set of dominant markers. We also perform a detailed sensitivity study to compare the power of the method using amplified fragment length polymorphism (AFLP), SNP, and microsatellite markers. The method has been implemented in a new software available at our website
We review commonly used population definitions under both the ecological paradigm (which emphasizes demographic cohesion) and the evolutionary paradigm (which emphasizes reproductive cohesion) and find that none are truly operational. We suggest several quantitative criteria that might be used to determine when groups of individuals are different enough to be considered 'populations'. Units for these criteria are migration rate ( m ) for the ecological paradigm and migrants per generation ( Nm ) for the evolutionary paradigm. These criteria are then evaluated by applying analytical methods to simulated genetic data for a finite island model. Under the standard parameter set that includes L = 20 High mutation (microsatellitelike) loci and samples of S = 50 individuals from each of n = 4 subpopulations, power to detect departures from panmixia was very high (∼ ∼ ∼ ∼ 100%; P < 0.001) even with high gene flow ( Nm = 25). A new method, comparing the number of correct population assignments with the random expectation, performed as well as a multilocus contingency test and warrants further consideration. Use of Low mutation (allozyme-like) markers reduced power more than did halving S or L . Under the standard parameter set, power to detect restricted gene flow below a certain level X (H 0 : Nm < X ) can also be high, provided that true Nm ≤ ≤ ≤ ≤ 0.5 X . Developing the appropriate test criterion, however, requires assumptions about several key parameters that are difficult to estimate in most natural populations. Methods that cluster individuals without using a priori sampling information detected the true number of populations only under conditions of moderate or low gene flow ( Nm ≤ ≤ ≤ ≤ 5), and power dropped sharply with smaller samples of loci and individuals. A simple algorithm based on a multilocus contingency test of allele frequencies in pairs of samples has high power to detect the true number of populations even with Nm = 25 but requires more rigorous statistical evaluation. The ecological paradigm remains challenging for evaluations using genetic markers, because the transition from demographic dependence to independence occurs in a region of high migration where genetic methods have relatively little power. Some recent theoretical developments and continued advances in computational power provide hope that this situation may change in the future.
Times Cited: 83International audienceAssignment methods, which use genetic information to ascertain population membership of individuals or groups of individuals, have been used in recent years to study a wide range of evolutionary and ecological processes. In applied studies, the first step of articulating the biological question(s) to be addressed should be followed by selection of the method(s) best suited for the analysis. However, this first step often receives less attention than it should, and the recent proliferation of assignment methods has made the selection step challenging. Here, we review assignment methods and discuss how to match the appropriate methods with the underlying biological questions for several common problems in ecology and conservation (assessing population structure; measuring dispersal and hybridization; and forensics and mixture analysis). We also identify several topics for future research that should ensure that this field remains dynamic and productive
The recent availability of next-generation sequencing (NGS) has made possible the use of dense genetic markers to identify regions of the genome that may be under the influence of selection. Several statistical methods have been developed recently for this purpose. Here, we present the results of an individual-based simulation study investigating the power and error rate of popular or recent genome scan methods: linear regression, Bayescan, BayEnv and LFMM. Contrary to previous studies, we focus on complex, hierarchical population structure and on polygenic selection. Additionally, we use a false discovery rate (FDR)-based framework, which provides an unified testing framework across frequentist and Bayesian methods. Finally, we investigate the influence of population allele frequencies versus individual genotype data specification for LFMM and the linear regression. The relative ranking between the methods is impacted by the consideration of polygenic selection, compared to a monogenic scenario. For strongly hierarchical scenarios with confounding effects between demography and environmental variables, the power of the methods can be very low. Except for one scenario, Bayescan exhibited moderate power and error rate. BayEnv performance was good under nonhierarchical scenarios, while LFMM provided the best compromise between power and error rate across scenarios. We found that it is possible to greatly reduce error rates by considering the results of all three methods when identifying outlier loci.
The study of population genetic structure is a fundamental problem in population biology because it helps us obtain a deeper understanding of the evolutionary process. One of the issues most assiduously studied in this context is the assessment of the relative importance of environmental factors (geographic distance, language, temperature, altitude, etc.) on the genetic structure of populations. The most widely used method to address this question is the multivariate Mantel test, a nonparametric method that calculates a correlation coefficient between a dependent matrix of pairwise population genetic distances and one or more independent matrices of environmental differences. Here we present a hierarchical Bayesian method that estimates F ST values for each local population and relates them to environmental factors using a generalized linear model. The method is demonstrated by applying it to two data sets, a data set for a population of the argan tree and a human data set comprising 51 populations distributed worldwide. We also carry out a simulation study to investigate the performance of the method and find that it can correctly identify the factors that play a role in the structuring of genetic diversity under a wide range of scenarios.
Genetic admixture of distinct gene pools is the consequence of complex spatiotemporal processes that could have involved massive migration and local mating during the history of a species. However, current methods for estimating individual admixture proportions lack the incorporation of such a piece of information. Here, we extend Bayesian clustering algorithms by including global trend surfaces and spatial autocorrelation in the prior distribution on individual admixture coefficients. We test our algorithm by using spatially explicit and realistic coalescent simulations of colonization followed by secondary contact. By coupling our multiscale spatial analyses with a Bayesian evaluation of model complexity and fit, we show that the algorithm provides a correct description of smooth clinal variation, while still detecting zones of sharp variation when they are present in the data. We also apply our approach to understand the population structure of the killifish, Fundulus heteroclitus, for which the algorithm uncovers a presumed contact zone in the Atlantic coast of North America.
We compare the performance of Nm estimates based on FST and RST obtained from microsatellite data using simulations of the stepwise mutation model with range constraints in allele size classes. The results of the simulations suggest that the use of microsatellite loci can lead to serious overestimations of Nm, particularly when population sizes are large (N > 5000) and range constraints are high (K < 20). The simulations also indicate that, when population sizes are small (N = 500) and migration rates are moderate (Nm approximately 2), violations to the assumption used to derive the Nm estimators lead to biased results. Under ideal conditions, i.e. large sample sizes (ns >/= 50) and many loci (nl >/= 20), RST performs better than FST for most of the parameter space. However, FST-based estimates are always better than RST when sample sizes are moderate or small (ns = 10) and the number of loci scored is low (nl < 20). These are the conditions under which many real investigations are carried out and therefore we conclude that in many cases the most conservative approach is to use FST.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.