2019
DOI: 10.1111/1755-0998.12995
|View full text |Cite
|
Sign up to set email alerts
|

Minor allele frequency thresholds strongly affect population structure inference with genomic data sets

Abstract: A common method of minimizing errors in large DNA sequence data sets is to drop variable sites with a minor allele frequency (MAF) below some specified threshold. Although widespread, this procedure has the potential to alter downstream population genetic inferences and has received relatively little rigorous analysis. Here we use simulations and an empirical single nucleotide polymorphism data set to demonstrate the impacts of MAF thresholds on inference of population structure—often the first step in analysi… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

9
234
0
2

Year Published

2019
2019
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 288 publications
(245 citation statements)
references
References 45 publications
(61 reference statements)
9
234
0
2
Order By: Relevance
“…To test the "colonization" versus "in situ" hypotheses explaining the presence of blackfin in the Fossmill outlet, we used the neutral loci dataset for two complementary clustering methods (Linck & Battey, 2019). (corresponding to the minimum and maximum expected number of genetic clusters).…”
Section: Genetic Analysesmentioning
confidence: 99%
See 1 more Smart Citation
“…To test the "colonization" versus "in situ" hypotheses explaining the presence of blackfin in the Fossmill outlet, we used the neutral loci dataset for two complementary clustering methods (Linck & Battey, 2019). (corresponding to the minimum and maximum expected number of genetic clusters).…”
Section: Genetic Analysesmentioning
confidence: 99%
“…hypotheses explaining the presence of blackfin in the Fossmill outlet, we used the neutral loci dataset for two complementary clustering methods(Linck & Battey, 2019). The model-based ADMIXTURE algorithm (Alexander, replicates, the analysis was first performed from K = 1 to K = 25 with all lakes in order to identify the glacial lineages present in the focal area of APP (and L. Memesagamesing).…”
mentioning
confidence: 99%
“…Kim et al (2011) argued that for rare SNPs (e.g., MAF < 0.01) it is not easy to differentiate between sequencing errors and a true rare allele, and alleles with less the 1% of MAF should be discarded. Linck and Battey (2019) showed that highly accurate population inferences are reached when relatively rare alleles are included (minimum allele count 2% to 8%). Therefore, in this study we set the MAF value at 2%.…”
Section: Genotypingmentioning
confidence: 99%
“…These are usually discarded for population analysis, because they are deemed uninformative and may contain errors (Roesti, Salzburger, & Berner, 2012). However, careful consideration of MAF filtering was recently recommended (Linck & Battey, 2019). Therefore, we compared the distribution of alleles over the geographical regions for MAF thresholds of minimum 1 and 5% (results not shown).…”
Section: Genotype Callingmentioning
confidence: 99%