Alignment‐free methods for polyploid genomes: Quick and reliable genetic distance estimation

VanWallendael, Acer; Álvarez, Mariano

doi:10.1111/1755-0998.13499

Cited by 11 publications

(14 citation statements)

References 91 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The outlier signal for two of the three candidate loci we describe in detail here resulted from structural variation that would not have been uncovered in an alignment-based analysis or using a RAD-Seq approach that may only survey a small portion of the genome or be prone to allele drop-out. Reference-free approaches continue to gain ground for studies of genomic and phenotypic variation in plants, with well- documented advantages (VanWallendael and Alvarez, 2020; Voichek and Weigel, 2020). Our study indicates that the utility of large WGS datasets may not be out of reach even for species such as Striga hermonthica characterized by large, complex genomes.…”

Section: Discussionmentioning

confidence: 99%

“…Mash , a dimensionality reduction technique based on the MinHash algorithm, was used to estimate genetic distance between samples based on resulting read sets (Ondov et al, 2016). Mash previously showed improved performance compared to alignment-based methods for estimating pairwise genetic distance for polyploid plant genomes using simulated and real data (VanWallendael and Alvarez, 2020). We used a k- mer size of 31, removing k -mers with less than 2 copies but increasing the sketch size to 1 x 10 7 to account for a larger volume of input data.…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Genomic signatures of host-specific selection in a parasitic plant

Bellis

Munchow

Odero

et al. 2022

Preprint

View full text Add to dashboard Cite

Premise: Parasitic plants and their hosts are emerging model systems for studying genetic variation in species interactions across environments. The parasitic plant Striga hermonthica (witchweed) attacks a range of cereal crop hosts in Africa. Striga hermonthica exhibits substantial genetic variation in host preference and in specificity versus generalism. Some of this variation is locally adapted, but the genetic basis of specialization on certain hosts is unknown. Methods: We present an alignment-free analysis of population diversity in S. hermonthica using whole genome sequencing (WGS) data for 68 individuals from western Kenya. We validate our reference-free approach with germination experiments and a de novo assembled draft genome. Results: K-mer based analyses reveal high genome-wide diversity within a single field, similar to values between individuals collected 100 km apart or farther. Analysis of host-associated k-mers implicated genes involved in development of the parasite haustorium (a specialized structure used to establish vascular connections with host roots) and a potential role of chemocyanins in molecular host-parasitic plant interactions. Conversely, no phenotypic or genomic evidence was observed suggesting host-specific selection on parasite response to strigolactones, hormones exuded by host roots and required for parasite germination. Conclusions: This study demonstrates the utility of WGS for plant species with large, complex genomes and no available reference. Contrasting with theory emphasizing the role of early recognition loci for genotype specificity, our findings support host-specific selection on later interaction stages, suggesting recurring host-specific selection each generation alternating with homogenizing gene flow.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Genomic signatures of host-specific selection in a parasitic plant

Bellis

Munchow

Odero

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…To observe the effects of potential biases in calling 8× variation from a 4× reference, we employed a reference-free k-mer (sequences of DNA of length “k”)-based approach to analyze the population structure and compare the overall patterns to the results obtained using SNPs. Reference-free and alignment-free methods of assessing population genetic structure have been shown to reduce some of the inherent biases involved in polyploid genetics ( 78 ). Here, we used the k-mer hashing method employed by the Mash program to confirm population genetic patterns derived from traditional SNP-based analyses ( 79 ).…”

Section: Methodsmentioning

confidence: 99%

“…Our analysis followed the method used in VanWallendael and Alvarez ( 78 ) on 40 individuals randomly selected from the study. Briefly, we first trimmed fastq sequence files randomly to equal size to reduce library-size biases using fastq-tools ( https://github.com/dcjones/fastq-tools ).…”

Section: Methodsmentioning

confidence: 99%

A generalist–specialist trade-off between switchgrass cytotypes impacts climate adaptation and geographic range

Napier

Grabowski

Lovell

et al. 2022

Proc. Natl. Acad. Sci. U.S.A.

Self Cite

View full text Add to dashboard Cite

Significance Polyploidy, which occurs in roughly half of all flowering plants and an even higher percentage of grasses, is thought to be a major driver of adaptation. Higher numbers of copies of each gene in polyploid genomes can increase genetic diversity, which could drive shifts in habitat preference, adaptability, and fitness. To test the effects of increased ploidy, we compared genomic diversity, environmental niche, and fitness responses across climatic gradients between tetraploid and octoploid switchgrass. We found that the octoploids contained novel combinations of the ancestral tetraploid genetic diversity, which was linked to the expansion of switchgrass into unsuitable habitats for tetraploid populations. Our experiments revealed evidence of niche divergence, differential fitness, and a generalist–specialist trade-off between cytotypes.

show abstract

“…mash does not assign samples by population thus to verify that grouping by superpopulation we checked for monophyly of each of the groups in the NJ tree by superpopulation label. We used the is.monophyletic function in the R package 71 to check whether each population was monophyletic in the resulting tree and ggplot 72 to plot the Boolean values for whether the tree with all the populations contained clades that are all monophyletic. PCA generated using k-mer frequencies from a single population from each of ve superpopulations using 2PCs.…”

Section: Admixed Populationsmentioning

confidence: 99%

Determining population structure from k-mer frequencies

Hrytsenko

Daniels

Schwartz

2022

Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics

View full text Add to dashboard Cite

Background:Determining population structure helps us understand connections among different populations and how they evolve over time. This knowledge is important for studies ranging from evolutionary biology to largescale variant-trait association studies. Current approaches to determining population structure include model-based approaches, statistical approaches, and distance-based ancestry inference approaches. In this work, we outline an approach that identi es population structure from k-mer frequencies using principal component analysis (PCA). This approach can be classi ed as statistical; however, prior work employing PCA has used multilocus genotype data (SNPs, microsatellites, or haplotypes), while here we analyze k-mer frequencies. K-mer frequencies can be viewed as a summary statistic of a genome and have the advantage of being easily derived from a genome by counting the number of times a k-mer occurred in a sequence. No genetic assumptions must be met to generate k-mers, whereas current population structure approaches often depend on several genetic assumptions and require careful selection of ancestry informative markers to identify populations. Results:In this work, we show that PCA is able to determine population structure just from the frequency of k-mers found in the genome. The application of PCA and a clustering algorithm to k-mer pro les of genomes provides an easy approach to detecting the number and composition of populations (clusters) present in the dataset. We describe this approach and show that the results are comparable to those found by a model-based approach using genetic markers. We validate our method using 48 human genomes from populations identi ed by the 1000 Human Genomes Project. We also compare our results to those from mash, which determines relationships among individuals using the number of matched k-mers between sequences. Conclusions:This study shows that PCA, together with a clustering algorithm, is able to detect population structure from k-mer frequencies and can identify samples of admixed and non-admixed origin. In contrast, mash (based on the number of k-mer matches) was highly sensitive to the parameters of k-mer length and sketch size. Using k-mer frequencies to determine population structure has the potential to avoid some challenges of existing methods.

show abstract

Alignment‐free methods for polyploid genomes: Quick and reliable genetic distance estimation

Cited by 11 publications

References 91 publications

Genomic signatures of host-specific selection in a parasitic plant

Genomic signatures of host-specific selection in a parasitic plant

A generalist–specialist trade-off between switchgrass cytotypes impacts climate adaptation and geographic range

Determining population structure from k-mer frequencies

Contact Info

Product

Resources

About