2018
DOI: 10.7554/elife.32920
|View full text |Cite
|
Sign up to set email alerts
|

Association mapping from sequencing reads using k-mers

Abstract: Genome wide association studies (GWAS) rely on microarrays, or more recently mapping of sequencing reads, to genotype individuals. The reliance on prior sequencing of a reference genome limits the scope of association studies, and also precludes mapping associations outside of the reference. We present an alignment free method for association studies of categorical phenotypes based on counting k-mers in whole-genome sequencing reads, testing for associations directly between k-mers and the trait of interest, a… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
134
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
2
2
1

Relationship

2
7

Authors

Journals

citations
Cited by 100 publications
(138 citation statements)
references
References 50 publications
1
134
0
Order By: Relevance
“…This approach, not centered around one specific reference genome, can identify biochemical pathways associated with, for example, pathogenicity. This approach has also been applied in humans, where the number of unique k -mers is much higher than in bacterial strains, due to their larger genome (Rahman et al, 2018) . However, this was restricted to case-control situations, and due to high computational load, population structure was corrected only for a subset of k -mers.…”
Section: Introductionmentioning
confidence: 99%
“…This approach, not centered around one specific reference genome, can identify biochemical pathways associated with, for example, pathogenicity. This approach has also been applied in humans, where the number of unique k -mers is much higher than in bacterial strains, due to their larger genome (Rahman et al, 2018) . However, this was restricted to case-control situations, and due to high computational load, population structure was corrected only for a subset of k -mers.…”
Section: Introductionmentioning
confidence: 99%
“…[26][27][28] Note that several alternative k-mer counting libraries and tools 29,30 have been developed to solve a variety of different biological problems. [31][32][33][34][35] Step 1: Identifying novel k-mers and reads To identify sequences spanning de novo variants, Kevlar scans each read sequenced from the proband. The per-sample abundances of each k-mer are queried from the Count-Min sketches computed in previous steps.…”
Section: Kevlar Workflowmentioning
confidence: 99%
“…To calculate the scores using the equations mentioned above, the prior probability distribution on numbers of copies of k-mers in the genome and conditional probability distributions on k-mer counts in the reads given the copy numbers in the genome need to be defined. When a k-mer appears in the read set due to the presence of one or more copies of the sequence in the genome, Poisson distributions have been observed to model the counts well in genome sequencing data [24]. If a genomic region is present i times, then the counts of the k-mers within that region are assumed to be Poisson distributed with mean λi, where λ is the k-mer coverage of the dataset.…”
Section: Learning Probability Distributions and Estimating Priorsmentioning
confidence: 99%