2016
DOI: 10.1007/978-3-319-38851-9_22
|View full text |Cite
|
Sign up to set email alerts
|

CHICO: A Compressed Hybrid Index for Repetitive Collections

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
22
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
6
1
1

Relationship

2
6

Authors

Journals

citations
Cited by 17 publications
(22 citation statements)
references
References 29 publications
0
22
0
Order By: Relevance
“…We test our implementations on five DNA datasets from the Pizza&Chili repetitive corpus 8 , which include the whole genomes of approximately 36 strains of the same eukaryotic species, a collection of 23 and approximately 78 thousand substrings of the genome of the same bacterium, and an artificially repetitive with the same sampling rates, to the five variants in the implementation of the LZ77 index described in [14], and to a recent implementation of the compressed hybrid index [22]. The FM index uses RRR bitvectors in its wavelet tree.…”
Section: Resultsmentioning
confidence: 99%
“…We test our implementations on five DNA datasets from the Pizza&Chili repetitive corpus 8 , which include the whole genomes of approximately 36 strains of the same eukaryotic species, a collection of 23 and approximately 78 thousand substrings of the genome of the same bacterium, and an artificially repetitive with the same sampling rates, to the five variants in the implementation of the LZ77 index described in [14], and to a recent implementation of the compressed hybrid index [22]. The FM index uses RRR bitvectors in its wavelet tree.…”
Section: Resultsmentioning
confidence: 99%
“…The first non-trivial implementation detail is that in our implementation we employ the idea described in [28] to reduce the number of LZ phrases. Namely, for any maximal sequence of adjacent phrases where each phrase has length ≤ M , we merge them into one superphrase.…”
Section: Implementation Detailsmentioning
confidence: 99%
“…Valenzuela [28] has since demonstrated hybrid indexing to be very effective in practice for indexing massive genomic data sets (in the terabyte range), and the technique now underlies tools for detecting genomic variants in pangenomic data [29]. However, Valenzuela's index is tightly coupled to the DNA alphabet and still carries the restriction that the maximum searchable pattern length is M , meaning it cannot be applied to long, so-called third generation DNA sequence reads (see, e.g.…”
Section: Introductionmentioning
confidence: 99%