2021
DOI: 10.1111/1755-0998.13499
|View full text |Cite
|
Sign up to set email alerts
|

Alignment‐free methods for polyploid genomes: Quick and reliable genetic distance estimation

Abstract: Polyploid genomes pose several inherent challenges to population genetic analyses.While alignment-based methods are fundamentally limited in their applicability to polyploids, alignment-free methods bypass most of these limits. We investigated the use of Mash, a k-mer analysis tool that uses the MinHash method to reduce complexity in large genomic data sets, for basic population genetic analyses of polyploid sequences. We measured the degree to which Mash correctly estimated pairwise genetic distance in simula… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
12
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 11 publications
(14 citation statements)
references
References 91 publications
1
12
0
Order By: Relevance
“…The outlier signal for two of the three candidate loci we describe in detail here resulted from structural variation that would not have been uncovered in an alignment-based analysis or using a RAD-Seq approach that may only survey a small portion of the genome or be prone to allele drop-out. Reference-free approaches continue to gain ground for studies of genomic and phenotypic variation in plants, with well- documented advantages (VanWallendael and Alvarez, 2020; Voichek and Weigel, 2020). Our study indicates that the utility of large WGS datasets may not be out of reach even for species such as Striga hermonthica characterized by large, complex genomes.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…The outlier signal for two of the three candidate loci we describe in detail here resulted from structural variation that would not have been uncovered in an alignment-based analysis or using a RAD-Seq approach that may only survey a small portion of the genome or be prone to allele drop-out. Reference-free approaches continue to gain ground for studies of genomic and phenotypic variation in plants, with well- documented advantages (VanWallendael and Alvarez, 2020; Voichek and Weigel, 2020). Our study indicates that the utility of large WGS datasets may not be out of reach even for species such as Striga hermonthica characterized by large, complex genomes.…”
Section: Discussionmentioning
confidence: 99%
“…Mash , a dimensionality reduction technique based on the MinHash algorithm, was used to estimate genetic distance between samples based on resulting read sets (Ondov et al, 2016). Mash previously showed improved performance compared to alignment-based methods for estimating pairwise genetic distance for polyploid plant genomes using simulated and real data (VanWallendael and Alvarez, 2020). We used a k- mer size of 31, removing k -mers with less than 2 copies but increasing the sketch size to 1 x 10 7 to account for a larger volume of input data.…”
Section: Methodsmentioning
confidence: 99%
“…To observe the effects of potential biases in calling 8× variation from a 4× reference, we employed a reference-free k-mer (sequences of DNA of length “k”)-based approach to analyze the population structure and compare the overall patterns to the results obtained using SNPs. Reference-free and alignment-free methods of assessing population genetic structure have been shown to reduce some of the inherent biases involved in polyploid genetics ( 78 ). Here, we used the k-mer hashing method employed by the Mash program to confirm population genetic patterns derived from traditional SNP-based analyses ( 79 ).…”
Section: Methodsmentioning
confidence: 99%
“…Our analysis followed the method used in VanWallendael and Alvarez ( 78 ) on 40 individuals randomly selected from the study. Briefly, we first trimmed fastq sequence files randomly to equal size to reduce library-size biases using fastq-tools ( https://github.com/dcjones/fastq-tools ).…”
Section: Methodsmentioning
confidence: 99%
“…mash does not assign samples by population thus to verify that grouping by superpopulation we checked for monophyly of each of the groups in the NJ tree by superpopulation label. We used the is.monophyletic function in the R package 71 to check whether each population was monophyletic in the resulting tree and ggplot 72 to plot the Boolean values for whether the tree with all the populations contained clades that are all monophyletic. PCA generated using k-mer frequencies from a single population from each of ve superpopulations using 2PCs.…”
Section: Admixed Populationsmentioning
confidence: 99%