Efficient association mapping from k-mers—An application in finding sex-specific sequences

Mehrab, Zakaria; Mobin, Jaiaid; Tahmid, Ibrahim Asadullah; Rahman, Atif

doi:10.1371/journal.pone.0245058

Cited by 7 publications

(5 citation statements)

References 13 publications

(23 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The successful use of this approach was once again demonstrated using the examples of ampicillin resistance in E. coli and for sex‐related sequences in human sex chromosomes. [ 130 ]…”

Section: Early Methods For Ld and Gwasmentioning

confidence: 99%

“…The successful use of this approach was once again demonstrated using the examples of ampicillin resistance in E. coli and for sex-related sequences in human sex chromosomes. [130] k-mers for GWAS in plants. As mentioned earlier, the utility of kmers for GWAS was also demonstrated, when k-mers were used for ∼ 2000 traits in three plant species, namely A. thaliana, tomato and maize.…”

Section: Svs In Plantsmentioning

confidence: 99%

See 1 more Smart Citation

GWAS for genetics of complex quantitative traits: Genome to pangenome and SNPs to SVs and k‐mers

Gupta

2021

BioEssays

View full text Add to dashboard Cite

The development of improved methods for genome-wide association studies (GWAS) for genetics of quantitative traits has been an active area of research during the last 25 years. This activity initially started with the use of mixed linear model (MLM), which was variously modified. During the last decade, however, with the availability of high throughput next generation sequencing (NGS) technology, development and use of pangenomes and novel markers including structural variations (SVs) and k-mers for GWAS has taken over as a new thrust area of research. Pangenomes and SVs are now available in humans, livestock, and a number of plant species, so that these resources along with k-mers are being used in GWAS for exploring additional genetic variation that was hitherto not available for analysis. These developments have resulted in significant improvement in GWAS methodology for detection of marker-trait associations (MTAs) that are relevant to human healthcare and crop improvement.

show abstract

“…The successful use of this approach was once again demonstrated using the examples of ampicillin resistance in E. coli and for sex‐related sequences in human sex chromosomes. [ 130 ]…”

Section: Early Methods For Ld and Gwasmentioning

confidence: 99%

Section: Svs In Plantsmentioning

confidence: 99%

GWAS for genetics of complex quantitative traits: Genome to pangenome and SNPs to SVs and k‐mers

Gupta

2021

BioEssays

View full text Add to dashboard Cite

show abstract

“…One useful pangenomics tool for measuring non-reference variation that is readily applicable to common 2/29 short-read datasets is the k-mer. K-mers are subsequences of length k derived from a larger sequence and they have a long history of use in computer science [Shannon, 1948], genome assembly [Turner et al, 2018], metagenomics [Benoit et al, 2016], and quantitative genetics [Rahman et al, 2018, Voichek and Weigel, 2020, Kim et al, 2020, Mehrab et al, 2021. Recent studies have also demonstrated the utility of k-mers for measuring heterozygosity and genetic differences between individuals (commonly referred to as "dissimilarity" measures, Ondov et al [2016], Vurture et al [2017], Ranallo-Benavidez et al [2020], VanWallendael andAlvarez [2022]).…”

Section: Introductionmentioning

confidence: 99%

Previously unmeasured genetic diversity explains part of Lewontin’s paradox in a k-mer-based meta-analysis of 112 plant species

Roberts,

Josephs

2024

Preprint

View full text Add to dashboard Cite

At the molecular level, most evolution is expected to be neutral. A key prediction of this expectation is that the level of genetic diversity in a population should scale with population size. However, as was noted by Richard Lewontin in 1974 and reaffirmed by later studies, the relationship between population size and diversity in nature is much weaker than expected. We hypothesize that one contributor to this apparent paradox is that current methods relying on single nucleotide polymorphisms (SNPs) called from aligning short reads to a reference genome underestimate levels of genetic diversity in many species. To test this idea, we calculated nucleotide diversity (π) and k-mer-based metrics of genetic diversity across 112 plant species, amounting to over 205 terabases of DNA sequencing data from 27,488 individual plants. We then compared how these different metrics correlated with proxies of population size that account for both range size and population density variation across species. We found that our population size proxies scaled anywhere from about 3 to over 20 times faster with k-mer diversity than nucleotide diversity after adjusting for evolutionary history, mating system, life cycle habit, cultivation status, and invasiveness. The relationship between k-mer diversity and population size proxies also remains significant after correcting for genome size, whereas the analogous relationship for nucleotide diversity does not. These results suggest that variation not captured by common SNP-based analyses explains part of Lewontin's paradox in plants.

show abstract

“…Multiple tools have been developed to perform k-mer based GWAS (e.g. bugwas 16 , HAWK 17,18 ). k-mers have the primary advantage of not requiring a reference genome, nor genome assembly.…”

Section: Introductionmentioning

confidence: 99%

ChoruMM: a versatile multi-components mixed model for bacterial-GWAS

Frouin

Laporte

Hafner

et al. 2023

Preprint

View full text Add to dashboard Cite

Genome-wide Association Studies (GWAS) have been central to studying the genetics of complex human outcomes, and there is now tremendous interest in implementing GWAS-like approaches to study pathogenic bacteria. A variety of methods have been proposed to address the complex linkage structure of bacterial genomes, however, some questions remain about to optimize the genetic modelling of bacteria to decipher causal variations from correlated ones. Here we examined the genetic structure underlying whole-genome sequencing data from 3,824 Listeria monocytogenes strains, and demonstrate that the standard human genetics model, commonly assumed by existing bacterial GWAS methods, is inadequate for studying such highly structured organisms. We leverage these results to develop ChoruMM, a robust and powerful approach that consists of a multi-component linear mixed model, where components are inferred from a hierarchical clustering of the bacteria genetic relatedness matrix. Our ChoruMM approach also includes post-processing and visualization tools that address the pervasive long-range correlation observed in bacteria genome and allow to assess the type I error rate calibration.

show abstract

Efficient association mapping from k-mers—An application in finding sex-specific sequences

Cited by 7 publications

References 13 publications

GWAS for genetics of complex quantitative traits: Genome to pangenome and SNPs to SVs and k‐mers

GWAS for genetics of complex quantitative traits: Genome to pangenome and SNPs to SVs and k‐mers

Previously unmeasured genetic diversity explains part of Lewontin’s paradox in a k-mer-based meta-analysis of 112 plant species

ChoruMM: a versatile multi-components mixed model for bacterial-GWAS

Contact Info

Product

Resources

About