2019
DOI: 10.1093/bioinformatics/btz299
|View full text |Cite
|
Sign up to set email alerts
|

kmcEx: memory-frugal and retrieval-efficient encoding of counted k-mers

Abstract: Motivation K-mers along with their frequency have served as an elementary building block for error correction, repeat detection, multiple sequence alignment, genome assembly, etc., attracting intensive studies in k-mer counting. However, the output of k-mer counters itself is large; very often, it is too large to fit into main memory, leading to highly narrowed usability. Results We introduce a novel idea of encoding k-mers a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
3
1
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 22 publications
0
3
0
Order By: Relevance
“…For instance, the 31-mers having a count larger than one of the HapMap sample NA12878 (()) take 90-Gb space on disk. To solve this problem, we have designed a novel coupled Bloom Filter-based algorithm achieving high memory saving ratio and good retrieval efficiency (Jiang et al, 2019). Let f max be the maximum frequency in K , which can be represented by at most h bits (in binary).…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…For instance, the 31-mers having a count larger than one of the HapMap sample NA12878 (()) take 90-Gb space on disk. To solve this problem, we have designed a novel coupled Bloom Filter-based algorithm achieving high memory saving ratio and good retrieval efficiency (Jiang et al, 2019). Let f max be the maximum frequency in K , which can be represented by at most h bits (in binary).…”
Section: Methodsmentioning
confidence: 99%
“…Based on the above steps, K f , K m , and K c can be saved into B f , B m , and B c economically; more details are shown in Jiang et al (2019).…”
Section: Methodsmentioning
confidence: 99%
“…Counting the frequencies of k-mers is an algorithm that is widely used in many areas of genomics (Xiao et al, 2018 ); from genome assembly and error detection to sequence alignment and variant calling (Kelley et al, 2010 ; Li et al, 2010 ). Others (Marçais and Kingsford, 2011 ; Rizk et al, 2013 ; Audano and Vannberg, 2014 ; Deorowicz et al, 2015 ; Li and Yan, 2015 ; Jiang et al, 2019 ) have explored ways to optimize k-mer counting with reduced memory and storage. While these k-mer counting algorithms process a single sample, SMUFIN processes k-mer counters of normal and tumoral samples of the same patient together, potentially making the memory footprint even bigger.…”
Section: Introductionmentioning
confidence: 99%