2017
DOI: 10.1101/188623
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Minor allele frequency thresholds dramatically affect population structure inference with genomic datasets

Abstract: Abstract.-Across the genome, the e↵ects of di↵erent evolutionary processes and historical events can result in di↵erent classes of genetic variants (or alleles) characterized by their relative frequency in a given population. As a result, population genetic inference can be strongly a↵ected by biases in laboratory and bioinformatics treatments that a↵ect the site frequence spectrum, or SFS. Yet despite the widespread use of reduced-representation genomic datasets with nonmodel organisms, the potential conseque… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
31
0
1

Year Published

2018
2018
2023
2023

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 33 publications
(32 citation statements)
references
References 38 publications
0
31
0
1
Order By: Relevance
“…Locus G9 was omitted due to the observed significant genotypic linkage to H29 , and G25 and H19 were omitted due to occurrence of observed rare alleles and low variability. It has been shown by Linck and Battey () that occurrences of rare alleles introduce noise when estimating population structure, blurring the population inference. Consequently, STRUCTURE analysis together with all the data analysis based on microsatellite markers were performed based on the remaining seven loci and the aggregated Danish samples.…”
Section: Resultsmentioning
confidence: 99%
“…Locus G9 was omitted due to the observed significant genotypic linkage to H29 , and G25 and H19 were omitted due to occurrence of observed rare alleles and low variability. It has been shown by Linck and Battey () that occurrences of rare alleles introduce noise when estimating population structure, blurring the population inference. Consequently, STRUCTURE analysis together with all the data analysis based on microsatellite markers were performed based on the remaining seven loci and the aggregated Danish samples.…”
Section: Resultsmentioning
confidence: 99%
“…Furthermore, when analyzing the RADseq data from three of the Gekko population pairs we are analyzing here, Oaks (2019) found biologically unrealistic estimates of divergence times and population sizes when only unlinked variable sites (i.e., SNPs) were analyzed. Using additional simulations, Oaks (2019) found these unrealistic estimates were likely due to data-acquisition biases, which are known to be common in alignments from reduced-representation genomic libraries (Harvey et al 2015;Linck and Battey 2019). Oaks (2019) found that using all of the sites, rather than only SNPs, greatly improved the robustness of these parameter estimates to such acquisition biases.…”
Section: Testing For Shared Divergencesmentioning
confidence: 99%
“…For A. xenodactyloides , dapc and admixture results were also incongruent ( k = 4 and k = 8, respectively). As dapc is known to be less sensitive to minor allele frequency thresholds than other model‐based clustering methods (Linck & Battey, ), we based subsequent A. fornasini analyses on the dapc results given their congruence with phylogenetic and phylogenomic inferences. Similarly, for A. xenodactyloides , subsequent analyses were based on k = 3 following consistently lowest BIC and CV scores in dapc and admixture analyses (Supporting Information Table ), and consistency with phylogenetic clustering.…”
Section: Resultsmentioning
confidence: 99%
“…We investigated population structure per taxon using discriminant analysis of principal components ( dapc ) in the adegenet r package (Jombart & Ahmed, ), after converting stacks output files into Fstat format using pgdspider 2.1.0.3 (Lischer & Excoffier, ). Unlike model‐based clustering methods, the dapc method is free of assumptions regarding Hardy–Weinberg equilibrium (Jeffries et al., ; Jombart & Ahmed, ) and less sensitive to minor allele frequency thresholds (Linck & Battey, ). We defined values of k between 1 (i.e., a single panmictic population) and 8, using Bayesian information criterion (BIC, Schwarz, ) scores across tested k values to infer the number of populations.…”
Section: Methodsmentioning
confidence: 99%