2018
DOI: 10.1093/bioinformatics/bty651
|View full text |Cite
|
Sign up to set email alerts
|

BinDash, software for fast genome distance estimation on a typical personal laptop

Abstract: Supplementary data are available at Bioinformatics online.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
50
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 56 publications
(52 citation statements)
references
References 3 publications
0
50
0
Order By: Relevance
“…Sets were constructed to have target Jaccard coefficients ranging from 0.00022 to 0.818. Many set-size pairs were evaluated ranging from equal-size sets to sets with sizes differing by a factor of 2 12 . In total, we evaluated 36 combinations of set size and J were evaluated, with full results presented in Additional File 2.…”
Section: Sketch Accuracymentioning
confidence: 99%
See 2 more Smart Citations
“…Sets were constructed to have target Jaccard coefficients ranging from 0.00022 to 0.818. Many set-size pairs were evaluated ranging from equal-size sets to sets with sizes differing by a factor of 2 12 . In total, we evaluated 36 combinations of set size and J were evaluated, with full results presented in Additional File 2.…”
Section: Sketch Accuracymentioning
confidence: 99%
“…Spurred by MinHash's utility, other groups have proposed alternatives using new ideas from search and data mining. BinDash [12] uses a b-bit one-permutation rolling MinHash to achieve greater accuracy and speed compared to Mash at a smaller memory footprint. Other theoretical improvements are proposed in the HyperMin-Hash [13] and SuperMinHash [14] studies.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…For estimating resemblance, Mash uses a 'bottom sketch' strategy as originally proposed by Broder [8]. More efficient techniques for estimating resemblance have since emerged [9,10,11,12], but bottom sketching is elegant in its simplicity. In short, all k-mers from a genome A are passed through a single hash function h but only the smallest m hash values are stored as the sketch S(A), where |S(A)| << |A|.…”
Section: Introductionmentioning
confidence: 99%
“…The many-fold size reductions gained via MinHash open the door to extremely large scale searches. While the initial k-mer MinHash implementation focused on enabling Jaccard similarity comparisons (3), it has since been modified and extended to enable k-mer abundance comparisons (4), decrease runtime and memory requirements (5), and work on streaming input data (6). Furthermore, as Jaccard similarity is impacted by the relative size of the sets being compared, containment searches (2,7,8) have been developed to enable detection of a small set within a larger set, such as a genome within a metagenome.…”
Section: Introductionmentioning
confidence: 99%