2017
DOI: 10.1093/bioinformatics/btx636
|View full text |Cite
|
Sign up to set email alerts
|

Squeakr: an exact and approximate k-mer counting system

Abstract: Supplementary data are available at Bioinformatics online.

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
81
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 70 publications
(84 citation statements)
references
References 28 publications
3
81
0
Order By: Relevance
“…We also used BLight to associate to each k-mer its number of occurrences across the datasets to show a straightforward proof of concept application of the library. We selected a dataset from TARA Oceans samples [29], and compare BLight to two lightweight recent k-mer abundance indexes: Short Read Connector (SRC) [14] and Squeakr [30]. Lastly we show that our index can also be used on other SPSS than the unitigs of the compacted de Bruijn graph.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…We also used BLight to associate to each k-mer its number of occurrences across the datasets to show a straightforward proof of concept application of the library. We selected a dataset from TARA Oceans samples [29], and compare BLight to two lightweight recent k-mer abundance indexes: Short Read Connector (SRC) [14] and Squeakr [30]. Lastly we show that our index can also be used on other SPSS than the unitigs of the compacted de Bruijn graph.…”
Section: Resultsmentioning
confidence: 99%
“…We compare a simple usage of BLight through a k-mer counting snippet, with two methods from state-ofthe-art that allow large scale k-mer to abundance association. Squeakr [33] is a k-mer counter based on a quotienting hashing technique, and Short Read Connector counter (SRC) [14] is based on a MPHF. We report the performances of the three tools on a large marine metagenomic dataset from TARA used the previous metagenomic experiment (ERR599280), counting 37 billion bases and 189 million reads.…”
Section: Application Example: Storing K-mer Countsmentioning
confidence: 99%
“…The compression of k-mer sets has not been extensively studied, except in the context of how k-mer counters store their output [17][18][19][20]. DSK [18] uses an HDF5-based encoding, KMC3 [17] combines a dense storage of prefixes with a sparse storage of suffixes, and Squeakr [20] uses a counting quotient filter [21]. The compression of read data, on the other hand, stored in either unaligned or aligned formats, has received a lot of attention [22][23][24].…”
Section: Related Workmentioning
confidence: 99%
“…All files are compressed with MFC or LZMA, in addition to the tool shown in the column name. Squeakr-exact's implementation is limited to k < 32 [20] and so it could not be run for k = 61.…”
Section: Evaluation Of Ust-fmmentioning
confidence: 99%
“…Thus, only reads with the dot product greater than 200 were included in later analysis. We modified squeakr 46 to perform these steps.…”
Section: /11mentioning
confidence: 99%